
STATUS OF THE CLAIMS 

Claims 1-11,1 6-27 and 32 were pending. 

Claims 1-11, 16-27 and 32 have been subjected to an election requirement 
under PCT Rule 13.1. 

Claims 1 , 6, 16 and 26 have been objected to as they include non-elected 
sequences. 

Claims 1-2, 6, 16, and 26 are objected to because the sequences are referred 
to by figure number, rather than by SEQ. ID. NO. 

Claims 1-2, 4-8, 1 1 and 32 have been rejected under 35 U.S.C. §101 as being 
directed to non-statutory subject matter. 

Claims 1-2, 4-8, 1 1 , 16-27 and 32 have been rejected under 35 U.S.C. §101 
because the claimed invention is not supported by either a specific asserted utility or 
a well established utility. 

Claims 1 , 4, 6-8, 11,1 6-27 and 32 have been rejected under 35 U.S.C. §1 1 2 
for lack of enablement. 

Claims 1-2, 4-8, 1 1 , 16-27 and 32 have been rejected under 35 U.S.C. §112 
for indefiniteness. 

Claims 1-2, 4, 6-8, 1 1 and 32 have been rejected under 35 U.S.C. § 102(b) as 
being anticipated by Burton, et al. (1995, GenBank Accession No. X80009 and Plant 
J. 8:3-15). 

Claims 1-2, 4 and 1 1 are rejected under 35 U.S.C. § 102(b) as being 
anticipated by Fisher, et al. (1996 GenBank Accession No. U22428 and Plant Mol. 
Biol. 30:97-108). 

Claims 1-2,4, 608, 11,16-17, 22-27 and 32 have been rejected under 35 
U.S.C. §103 as being unpatentable over Hofvander, et al. (WO 92/1 1375) in view of 
Burton et al. and Fisher, et al. 

Claims 1-3, 6, 16-18, and 20-27 have been amended. 

Claims 1-11, 16-27 and 32 are presented for reconsideration. 

REMARKS 

Applicants argument against restriction of claims 1-11,1 6-27 and 32 was 
deemed unpersuasive by the Examiner and made final. The Examiner states that 
"claim 1 is directed to a nucleic acid encoding a polypeptide having any SBE activity" 
and "as the nucleic acid taught by Cooke, et al. encodes a polypeptide having starch 
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branching activity and shares at least one amino acid with SEQ ID NO 29 or 31 , 
Cooke et al renders the techical feature nonspecial." 

Applicants would like to clarify that SEQ ID NOS. 29 and 31 show the 
sequence for full length cassava SBE II sequences (see the descriptions of Figures 4 
and 13 on pages 9 and 1 1 of the published application). Claim 1 is directed to a 
nucleic acid sequence encoding a polypeptide having SBE II activity , not any SBE 
activity. In contrast, Cooke discloses altering potato plants by using SBE I , a novelty 
distinguishing point. The fact that the polypeptides "share at least one amino acid" is 
irrelevant in that there are a limited number of amino acids and most sequences 
"share at least one amino acid." For example, a human being shares at least one 
amino acid with the SBE I sequence of Cooke, yet the Examiner cannot possibly be 
suggesting a human being is not unique over such invention. Therefore, the present 
application has a unique and special technical feature over the prior art and the 
requirement of unity is met and Applicants respectfully request that the restriction 
requirement be removed. 

Claims 1-2, 6, 16, and 26 are objected to because the sequences are referred 
to by figure number, rather than by SEQ. ID. NO. The claims have been amended so 
as to refer to the sequences by sequence identification numbers, thus overcoming 
this rejection. 

The claims have been amended to clarify that the nucleic acid sequences 
encode polypeptides having SBE II activity in cassava. Such amendment does not 
change the scope of the claims as it is clear from the specification and the original 
wording of the claims that SBE II activity is intended. 

Claims 1-2, 4-8, 1 1 and 32 have been rejected under 35 U.S.C. §101 as being 
directed to non-statutory subject matter. In view of the foregoing, Applicant 
respectfully requests that the election requirement reconsidered and withdrawn, and 
claims 1-11, 16-27 and 32 be examined on the merits. 

A substitute specification excluding the claims was required under 37 C.F.R. 
§1 . 1 .125(a) as the specification did not use an assigned sequence identifier in all 
instances where a sequence was discussed and the quality was considered faint and 
irregular. Such substitute specification is included herewith as Appendix C along with 
a marked up version as Appendix D. No new matter is included. 
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The drawings were objected to for the reasons noted on the Notice of 
Draftsperson's Patent Drawing Review. Formal drawings have been prepared which 
correct the cited informalities and were submitted under separate cover. 

Claims 1 , 6, 16 and 26 have been objected to because they include non- 
elected sequences. These claims have not been amended as the Examiner's 
reasoning for upholding the election requirement was incorrect and Applicants have 
respectfully requested reconsideration in light of the provided clarification. 

Claims 1-2, 6, 16 and 26 have been objected to because the sequences are 
referred by figure number rather than be sequence identifier number. These claims 
have been amended to substitute the sequence identifier numbers for the figure 
numbers. 

Claims 1-2, 4-8, 1 1 and 32 have been rejected under 35 U.S.C. §101 as being 
directed to non-statutory subject matter as they allegedly read on a product of nature. 
Applicants were the first to sequence and isolate the SBE II polypeptide, making it a 
new composition and therefore patentable. See Parke-Davis v. Mulford (2 nd Cir. 
1912) 196 F. 496. The claims have been amended according to the Examiner's 
suggestion to clarify this distinction. 

Claims 1 -2, 4-8, 11,1 6-27 and 32 have been rejected under 35 U.S.C. §1 01 
because the claimed invention is not supported by either a specific asserted utility or 
a well established utility. The Examiner states that the claims are drawn to nucleic 
acids which "include those that encode SBE I enzymes." Applicants respectfully 
traverse. 

The present invention is drawn to nucleic acid sequences "comprising at least 
an effective portion of the amino acid sequences of SEQ ID NO. 29 or SEQ. ID. NO. 
31", both sequences being SBE II polypeptides. Such effective portion encodes for 
the SBE II functionality of branching starch molecules, thereby decreasing the relative 
amount of amylose in the starch of modified plants. In contrast, it has been shown 
that SBE I does not encode for the same functionality. As the present invention 
claims nucleic acid sequences encoding SBE II polypeptides and the application 
teaches a specific utility for such SBE II polypeptides, the rejection has been 
overcome. 

Claims 1, 4, 6-8, 11 and 16-27 have been rejected under 35 U.S.C. §112 for 
lack of enablement as the invention is allegedly not supported by either a specific 
asserted utility or a well established utility. The utility point has already been 
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addressed above under the similar 35 U.S.C. §101 rejection. As the claims have 
utility, they are enabled such that one skilled in the art clearly would know how to use 
the claimed invention. 

Claims 1-2, 4-8, 11, 16-27 and 32 have been rejected under 35 U.S.C. §112 
for lack of enablement as the specification allegedly does not enable one skilled in 
the art to make and/or use the invention commensurate in scope with the claims. The 
Examiner states that the specification fails to provide guidance for which amino acids 
of SEQ ID NO. 29 can be altered and to which other amino acids and which cannot to 
maintain SBE II activity. The Examiner states that given the claim breadth, 
unpredictability, and lack of guidance, undue experimentation would be required by 
one skilled in the art to develop and evaluate nucleic acids that encode a multitude of 
effective portions of SEQ ID No 29 which hybridize to SEQ ID No 28, methods of their 
use and plants transformed with them. Applicants respectfully traverse. There is no 
undue experimentation needed to develop the methods of use and plants 
transformed. The rejection with respect to the claims direct to sequences is 
addressed below. 

The determination of what constitutes undue experimentation in a given case 
requires the application of a standard of reasonableness, having due regard for the 
nature of the invention and the state of the art. The factors to be considered have been 
summarized as the quantity of experimentation necessary, the amount of direction or 
guidance presented, the presence or absence of working examples, the nature of the 
invention, the state of the prior art, the relative skill of those in that art, the predictability 
or unpredictability of the art and the breadth of the claims. Ex part Forman, et al., 230 
USPQ 546, 547(1986). 

The test is not merely quantitative, since a considerable amount of 
experimentation is permissible, if it is merely routine, or if the specification in question 
provides a reasonable amount of guidance with respect to the direction in which the 
experimentation should proceed to enable the determination of how to practice a desired 
embodiment of the invention claimed. Genetic manipulation and antisense technology 
are well known in the art such that such experimentation of manipulating the provided 
sequences and testing them in the antisense mode is merely routine. Thus, even if each 
variation of the disclosed sequences needed to be tested to determine if it was still 
effective to suppress amylopectin formation (starch branching), undue experimentation 
would not be required. 
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However, each and every variation would not need to be tested. The specification 
provides guidance as to which portions are effective in that they retain sufficient SBE 
activity. For example, the specification states that the transit peptide is not essential for 
SBE activity and thus may be modified or even deleted without loss of the functionality. 
The specification also states that N-terminal amino acid residues up to the proline elbow 
typically do not need to be conserved to retain functionality. Further, several working 
examples are given to guide one skilled in the art. 

The invention pertains to the effective portion of only two sequences to encode a 
single enzyme functionality. Importantly, such SBE II sequences are known for other 
plants, as disclosed in the specification (see Figure 8). It is a well-established technique 
to conduct amino acid sequence alignments for homologous proteins from different 
sources. Such comparisons reveal those portions of the polypeptides which have been 
evolutionarily conserved (and are therefore presumably more critical to functionality) and 
those portions which are more variable and which will therefore tolerate substitutions 
without significant detrimental effect on functionality. Guidance in this respect is given in 
Figure 8 and the associated discussion (page 1 8). Further guidance is given by the fact 
that more than one sequence is provided by Applicants, as supported by Bowie, et al. [of 
record] at page 1309, right hand column, which states that as there "is more information 
in a set of related sequences than in a single sequence ... such information permits the 
evaluation of a residue's importance to the function and stability of a protein." 

The relative skill of those in the art is extremely high. Screening is also common in 
the art and although not necessarily predictable, one skilled in the art knows what types 
of substitutions generally will or will not retain functionality. 

The claims are not overly broad in that they deal with a single enzyme and a single 
functionality. Further, the nucleic acid sequence must contain the effective portion of the 
sequences disclosed as SEQ ID NOS 29 and 31 . 

The Examiner cites several references to show the unpredictability of the art. 
However, the relationship between homology and functionality is not consistent across 
all polypeptides. For example, though changing one amino acid in a sequence may 
cause the polypeptide to lose it's functionality in the art of growth factors, plant branching 
enzymes are not as sensitive and are more predictable. Further, it is dependent upon 
the types of substitutions made and where they are made. Some guidance for 
substitution is given in the specification. Finally, the number of substitutions which can 
be made is dependent upon the length of the sequence and the effective portion thereof. 
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Regarding the citation of Broun (Science 282:1315 (1998)), the Examiner states 
that a change of four amino acids resulted in a significant change. This is true only true 
as to the four essential amino acids out of a total of seven amino acid residues. The 
currently claimed sequence is on the order of 100 times longer and thus differs 
substantially in the non-functional modifications which may be made. 

Applicants contend that the many of the other references cited by the Examiner 
have similar flaws in being used against the predictability of modifications which may be 
made in the presently claimed sequences. The more relevant references cited are those 
which deal with SBE activity and these will now be addressed. 

According to the Examiner, Kossman, et al teach that severe reduction of the 
levels of potato SBE RNA by antisense technology resulted in no change in chain length 
distribution or size of the amylopectin structure in potato. However, Kossman states that 
potato only contains one isoform of SBE, that of SBE I which is known. Modification of 
starch properties in potato plants by preventing expression of this "single known" SBE 
gene is not successful as there are two SBE isoforms in potato. It is the SBE II gene 
which is essential in modification. See for example EP 826 061 (Jobling, et al.). This is 
further evidenced in the Jobling reference cited by the Examiner (Plant Journal 
18(2): 163 (1999)). Neither of these references show any unpredictability in the art of 
SBE II enzymes. In fact, the Jobling references show that SBE II functionality is 
maintained in partial sequences. 

In view of the above, the 35 U.S.C. §112 rejection regarding undue 
experimentation is overcome. 

Claims 1 -2, 4-8, 11,1 6-27 and 32 have been rejected under 35 U.S.C. § 1 1 2 
for lack of enablement as containing subject matter that was allegedly not described 
in such a way as to reasonable convey to one skilled in the art that the inventors had 
possession of the invention. The Examiner states that the claims are not limited to 
nucleic acids that only encode SBE II enzymes nor does the specification indicate if 
SEQ ID NO 29 is an SBE II A or B. Applicants would like to point out to the Examiner 
that there are two SBE isoforms, SBE Class A also known in the art as SBE II and 
SBE Class B also known in the art as SBE I. SEQ ID NO 29 specifies that the 
sequence encoded is an SBE II (ie. Class A). See for example page 14 , line 21 , of 
WO 98/20145, from which the present application claims priority. Thus, the rejection 
has been overcome. 
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Claims 1 -2, 4-8, 11,1 6-27 and 32 have been rejected under 35 U.S.C. §1 1 2 
for indefiniteness. Claim 1 was deemed indefinite due to the term "effective portion." 
Applicants respectfully disagree. The term effective portion is defined in the 
specification as the portion which retains sufficient SBE II activity of the SBE enzyme 
to complement the glycogen branching deficient mutation in E. Coli KV832 and give a 
positive result as assayed by iodine staining (see page 17, paragraphs 2 and 5). 
Several sequences containing such effective portion are disclosed in the specification 
as well as guidance as to where the effective portion lies. Further, this term is well 
known in the art and understood by those skilled therein. See for example analogous 
art claiming effective portions of SBE sequences including US 6,103,893. 

Claims 6,16, and 26 were deemed indefinite as the term "corresponding 
region" as it is "unclear what nucleotides are encompassed by his region or is the size 
of the region clear". The rejection is rendered moot by amendment of the claims to 
remove the objected to phrase. 

The term 'functionally equivalent nucleotide sequence" in claim 2 has also 
been deemed indefinite. The rejection is rendered moot by amendment of the claims 
to remove the objected to phrase. Such amendment does not change the scope of 
the claim in that claim 2 is dependent upon claim 1 which recites the required 
functionality, making the phrase "functionally equivalent" redundant. 

Claim 5 has been deemed indefinite as it was unclear what the term "having 
the amino acid sequence NSKH at about residue 697" was intended to mean. The 
Examiner correctly ascertained that NSKH referred to the amino acid sequence "Asn- 
Ser-Lys-His." These are commonly used abbreviations in the art as evidenced by 
Lehninger, Principles of Biochemistry . Worth Publishers, Inc., New York, pp. 96 1982 
[enclosed]. 

The term "stringent hybridization conditions" is deemed to render claim 2 
indefinite as the specification allegedly does not provide a standard for ascertaining 
the requisite degree such that one skilled in the art would be reasonably apprised of 
the metes and bounds of the invention. These conditions are exemplified at page 4 
of the PCT publication. 

Claim 21 was deemed indefinite as the term "the cassava SBE I gene" has 
insufficient antecedent basis. Claim 1 has been amended to overcome this rejection. 

Claims 20-21 have been deemed indefinite with respect to the term "at least a 
part of." Claims 20 and 21 have been amended to overcome this rejection. 
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Claim 22 has been deemed indefinite as not written in proper Markush format. 
The claim has been amended to overcome the rejection. 

Claim 23 is indefinite in that it is allegedly unclear as to which starch 
properties differ. The specification provides guidance as to which starch properties 
will differ. SBE II is responsible for starch branching. Thus, interference with the 
expression of SBE II in the host cell will result in starch with less branching relative to 
an unaltered cell. This has been clarified in the claims as amended. 

The term "growing" in claims 24-25 has been deemed indefinite as the plants 
are regenerated, rather than grown, from plant cells. Claims 24-25 have been 
amended according to the Examiner's suggestion. 

The term "said transcript and or translation product" in claims 1 6 and 1 8 are 
indefinite for lack of antecedent basis. The claims have been amended to provide 
proper antecedent basis or to otherwise comply with patent practice. 

Claims 16 and 18 are indefinite in that it is allegedly unclear to what the gene 
is homologous nor is it allegedly clear to which gene is being referred. Applicants 
respectfully disagree as a person skilled in the art would understand that when 
introducing a nucleic acid sequence in the sense or antisense orientation to interfere 
with the expression of a homologous gene naturally present in the cell, that the 
homologous gene is that which is of the same effective functionality as the nucleic 
acid being introduced. Thus, if SEQ ID NO. 29 which is the sequence encoding the 
functionality of an SBE II gene, the homologous gene would be the SBE II gene 
naturally present in the cell. 

Claims 24 and 27 have been amended according to the Examiner's 
suggestion to overcome the indefinite rejections. 

Claims 16-24 have been rejected as indefinite as "being incomplete for 
omitting essential steps." Claim 16 has been amended to refer to a method of 
altering the expression of a gene in a plant cell and the last method step recited in the 
claim results in the alteration of said expression level. 

Claims 1-2, 4, 6-8, 11 and 32 have been rejected under 35 U.S.C. § 102(b) as 
being anticipated by Burton, et al. (1995, GenBank Accession No. X80009 and Plant 
J. 8:3-15) as "Burton teaches a pea nucleic acid that ... encodes a protein with SBE I 
activity." In contrast, the present invention claims a nucleic acid sequence which 
encode a polypeptide with SBE II activity. Thus, the present invention is novel over 
Burton, et al. 
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Claims 1-2, 4 and 1 1 are rejected under 35 U.S.C. § 102(b) as being 
anticipated by Fisher, et al. (1996 GenBank Accession No. U22428 and Plant Mol. 
Biol. 30:97-108) as Fisher teaches "a nucleic acid that encodes an SBE II," "would 
share at least one amino acid with SEQ ID NO 29, and the nucleic acid would 
hybridize to SEQ ID NO 28." Fisher discloses a nucleic acid that encodes a maize 
SBE II. In contrast, the present application discloses the sequence for cassava SBE 
II and claims the nucleic acid sequence which encodes for a polypeptide having SBE 
II activity in cassava. There is no evidence that the maize SBE II would have SBE II 
activity in cassava. Thus, the rejection has been overcome. 

Claims 1-2, 4, 608, 11,16-17, 22-27 and 32 have been rejected under 35 
U.S.C. §103 as being unpatentable over Hofvander, et al. (WO 92/1 1375) in view of 
Burton et al. and Fisher, et al. as Hofvander "discloses a method of using antisense 
constructs of nucleic acids encoding BE to alter a plant host cell." "Hofvander does 
not disclose the use of nucleic acids encoding other SBE enzymes." As detailed 
above, neither Burton nor Fisher disclose the nucleic acid sequences of the present 
application. Thus, neither Burton nor Fisher cures the deficiency of Hofvander and 
the rejection has been overcome. 

In light of the amendment and arguments above, the application is in condition 
for allowance. Applicants respectfully request reconsideration and early action. 



Respectfully submitted, 




National Starch and Chemical Company 
P.O. Box 6500 

Bridgewater, NJ 08807-0500 
(908) 575-6152 



Karen G. Kaiser 
Attorney for Applicants 
Reg. No. 33,506 
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Appendix A 
(marked up claims) 

1. (amended once) [A] An isolated nucleic acid sequence encoding a polypeptide 
having starch branching enzyme Class A (SBE M) activity in cassava , the encoded 
polypeptide comprising at least an effective portion of the amino acid sequence [shown 
in Figure 4 or Figure 1 3] of SEQ. ID. NO. 29 or SEQ. ID. NO. 31 . 

2. (amended once) A nucleic acid sequence according to claim 1, comprising 
nucleotides 21-2531 of the nucleic acid sequence [shown in Figure 4] of SEQ. ID. NO. 
29, or a functionally equivalent nucleotide sequence which hybridises under stringent 
hybridisation conditions with the nucleic acid sequence [shown in Figure 4] of SEQ. ID. 
NO. 29 . 

3. (amended once) A nucleic acid sequence according to claim 1, comprising 
nucleotides 131 -2677 of the nucleic acid sequence [shown in Figure 13] of SEQ. ID. NO. 
31. or a functionally equivalent sequence which hybridises under stringent hybridisation 
conditions with the nucleic acid sequence [shown in Figure 13] of SEQ. ID. NO. 31 . 

6. (amended once) [A] An isolated nucleic acid sequence comprising at least 200bp 
and exhibiting at least 88% sequence identity with [the corresponding region of] the DNA 
sequence [shown in Figures 4, 9, 10 or 13] of SEQ. ID. NO. 29 or SEQ. ID. NO. 31 . 
operably linked in the sense or anti-sense orientation to a promoter operable in plants^ 
said sequ ence encoding a polypeptide having starch branching enzvme Class A (SBE in 
activity in cassava . 

1 6. (amended once) A method of altering the expression of a gene naturally present in a 
plant host cell, said gene encoding a polypeptide having SBE II activity in cassava, the 
method comprising introducing into the cell a nucleic acid sequence comprising at least 
200bp and exhibiting at least 88% sequence identity with [the corresponding region of] 
the DNA sequence [shown in Figures 4, 9, 10 or 13] of SEQ. ID. NO. 29 or SEQ. ID. NO. 
31. operably linked in the sense or anti-sense orientation to a suitable promoter active in 
the host cell, and causing transcription of the introduced nucleotide sequence to produce 
a transcript , said transcript and/or [the] a translation product thereof being sufficient to 
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interfere with the expression of [a homologous] the gene naturally present in the host 
cell, [which homologous gene encodes a polypeptide having SBE activity] thereby 
altering the expression of the gene . 

17. (amended once) A method according to claim 16, wherein the host cell is selected 
from the group consisting of [from] a cassava cell, banana ce|l, potato cell, pea cell, 
tomato cell, maize cell, wheat cell, barley cell, oat cell, sweet potato cell [or] and rice 
plant ceH. 

1 8. (amended twice) A method according to claim 1 6, comprising the introduction of 
one or more further nucleic acid sequences, operably linked in the sense or anti-sense 
orientation to a suitable promoter active in the host cell, and causing transcription of the 
one or more further nucleic acid sequences to produce a transcript , said transcripts] 
and/or a translation produces] thereof being sufficient to interfere with the expression of 
a [homologous] gene(s) naturally present in the host cell. 

20. (amended twice) A method according to claim 18, wherein the further nucleic acid 
sequence comprises [at least part of an] a portion of an SBE I gene effective to interfere 
with the expression of an SBE I gene naturally present in the host cell . 

21 . (amended once) A method according to claim 20, wherein the further nucleic acid 
sequence comprises [at least part of the] a portion of a cassava SBE I gene effective to 
interfere with the expression of an SBE I gene naturally present in the host cell . 

22. (amended twice) A method according to claim 1 6, wherein the host cell is 
selected from the group consisting of [one of the following:] cassava cell, banana cell , 
potato cell, pea cell, tomato ceH, maize ceH, wheat cell, barley ceH, oat ceH, sweet 
potato ceH [or] and rice cell. 

23. (amended once) A method according to claim 16, wherein the introduced 
seguence inhibits expression of the gene naturally present in the host cell and 
wherein the altered host cell gives rise to starch [having different properties] which 
contains less branching compared to starch from an unaltered cell. 
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24. (amended twice) A method according to any one of claims 1 6-22 [claim 1 6], 
further comprising the step of [growing] regenerating the altered host cell into a plant 
or plantlet. 

25. A method of obtaining starch having altered properties, comprising [growing] 
regenerating a plant from an altered host cell according to the method of claim 24, and 
extracting the starch therefrom. 

26. A plant or plant cell into which has been artificially introduced a nucleic acid 
seguence comprising at least 200bp and exhibiting at least 88% seguence identity with 
the corresponding region of the DNA seguence [shown in Figures 4, 9, 10 or 13] of SEQ. 
ID. NO. 29 or SEQ. ID. NO. 31. operably linked in the sense or anti-sense orientation to 
a promoter operable in plants, or the progeny thereo f, wherein said seguence encodes a 
polypeptide having starch branching enzyme Class A (SBE II) activity in cassava . 

27. (amended once) A plant obtainable by the method of [according to] claim 24[, 
altered by the method of any one of claims 1 6-22]. 
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Appendix B 
(clean copy of pending claims) 

1. An isolated nucleic acid sequence encoding a polypeptide having starch branching 
enzyme Class A (SBE II) activity in cassava, the encoded polypeptide comprising at 
least an effective portion of the amino acid sequence of SEQ. ID. NO. 29 or SEQ. ID. 
NO. 31. 

2. (amended once) A nucleic acid sequence according to claim 1, comprising 
nucleotides 21-2531 of the nucleic acid sequence of SEQ. ID. NO. 29, or a functionally 
equivalent nucleotide sequence which hybridises under stringent hybridisation conditions 
with the nucleic acid sequence of SEQ. ID. NO. 29. 

3. (amended once) A nucleic acid sequence according to claim 1, comprising 
nucleotides 131-2677 of the nucleic acid sequence of SEQ. ID. NO. 31 , or a functionally 
equivalent sequence which hybridises under stringent hybridisation conditions with the 
nucleic acid sequence of SEQ. ID. NO. 31. 

4. (amended once) A nucleic acid sequence according to claim 1 comprising a 5' 
and/or a 3' untranslated region. 

5. (amended once) A nucleic acid sequence according to claim 1 , encoding a 
polypeptide having the amino acid sequence NSKH at about residue 697. 

6. (amended once) An isolated nucleic acid sequence comprising at least 200bp and 
exhibiting at least 88% sequence identity with the DNA sequence of SEQ. ID. NO. 29 or 
SEQ. ID. NO. 31, operably linked in the sense or anti-sense orientation to a promoter 
operable in plants, said sequence encoding a polypeptide having starch branching 
enzyme Class A (SBE II) activity in cassava. 

7. A nucleic acid sequence according to claim 6, comprising at least 300-600bp. 

8. (amended once) A sequence according to claim 6, comprising a 5'and/or 
3'untranslated region. 
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9. A sequence according to claim 8, comprising nucleotides 688-1044 of the sequence 
shown in Figure 9, and/or nucleotides 1507-1900 of the sequence shown in Figure 10. 

10. A sequence according to claim 6, comprising the nucleotide sequence shown in 
Figure 10. 

1 1 . (amended once) A replicable nucleic acid construct comprising a nucleic acid 
sequence according to claim 1 . 

16. (amended once) A method of altering the expression of a gene naturally present in a 
plant host cell, said gene encoding a polypeptide having SBE II activity in cassava, the 
method comprising introducing into the cell a nucleic acid sequence comprising at least 
200bp and exhibiting at least 88% sequence identity with the DNA sequence of SEQ. ID. 
NO. 29 or SEQ. ID. NO. 31, operably linked in the sense or anti-sense orientation to a 
suitable promoter active in the host cell, and causing transcription of the introduced 
nucleotide sequence to produce a transcript, said transcript and/or a translation product 
thereof being sufficient to interfere with the expression of the gene naturally present in 
the host cell, thereby altering the expression of the gene. 

17. (amended once) A method according to claim 16, wherein the host cell is selected 
from the group consisting of a cassava cell, banana cell, potato cell, pea cell, tomato 
cell, maize cell, wheat cell, barley cell, oat cell, sweet potato cell and rice plant cell. 

18. (amended twice) A method according to claim 16, comprising the introduction of 
one or more further nucleic acid sequences, operably linked in the sense or anti-sense 
orientation to a suitable promoter active in the host cell, and causing transcription of the 
one or more further nucleic acid sequences to produce a transcript, said transcript 
and/or a translation product thereof being sufficient to interfere with the expression of a 
gene(s) naturally present in the host cell. 

19. A method according to claim 18, wherein the one or more further nucleic acid 
sequences interfere with the expression of a gene involved in starch biosynthesis. 
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20. (amended twice) A method according to claim 18, wherein the further nucleic acid 
sequence comprises a portion of an SBE I gene effective to interfere with the expression 
of an SBE I gene naturally present in the host cell. 

21. (amended once) A method according to claim 20, wherein the further nucleic acid 
sequence comprises a portion of a cassava SBE I gene effective to interfere with the 
expression of an SBE I gene naturally present in the host cell. 

22. (amended twice) A method according to claim 1 6, wherein the host cell is 
selected from the group consisting of cassava cell, banana cell, potato cell, pea cell, 
tomato cell, maize cell, wheat cell, barley cell, oat cell, sweet potato cell and rice cell. 

23. (amended once) A method according to claim 16, wherein the introduced 
sequence inhibits expression of the gene naturally present in the host cell and 
wherein the altered host cell gives rise to starch which contains less branching 
compared to starch from an unaltered cell. 

24. (amended twice) A method according to any one of claims 16-22, further 
comprising the step of regenerating the altered host cell into a plant or plantlet. 

25. A method of obtaining starch having altered properties, comprising regenerating a 
plant from an altered host cell according to the method of claim 24, and extracting the 
starch therefrom. 

26. A plant or plant cell into which has been artificially introduced a nucleic acid 
sequence comprising at least 200bp and exhibiting at least 88% sequence identity with 
the corresponding region of the DNA sequence of SEQ. ID. NO. 29 or SEQ. ID. NO. 31, 
operably linked in the sense or anti-sense orientation to a promoter operable in plants, or 
the progeny thereof, wherein said sequence encodes a polypeptide having starch 
branching enzyme Class A (SBE II) activity in cassava. 

27. (amended once) A plant obtainable by the method of claim 24. 
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32. A replicable nucleic acid construct comprising a nucleic acid sequence according 
to claim 6. 
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Title: Improvemen ts in or Relating to Starch Content of Plants 



y / Field of the Invention 



This invention relates to novel nucleic acid sequences, vectors and host cells comprising 
the nucleic acid sequence(s), to polypeptides encoded thereby, and to a method of altering 
a host cell by introducing the nucleic acid sequence(s) of the invention. 



Background to the Invention 



Starch consists of two main polysaccharides, amylose and amylopectin. Amylose is a 
linear polymer containing a-1,4 linked glucose units, while amylopectin is a highly 
branched polymer consisting of a a-1,4 linked glucan backbone with a-1,6 linked glucan 
branches. In most plant storage reserves amylopectin consitutes about 75% of the starch 
content. Amylopectin is synthesized by the concerted action of soluble starch synthase and 
starch branching enzyme [a-1,4 glucan: a-1,4 glucan 6-glycosyltransferase, EC 2.4.1.18]. 
Starch branching enzyme (SBE) hydrolyses a-1,4 linkages and rejoins the cleaved glucan, 
via an a-1,6 linkage, to an acceptor chain to produce a branched structure. The physical 
properties of starch are strongly affected by the relative abundance of amylose and 
amylopectin, and SBE is therefore a crucial enzyme in determining both the quantity and 
quality of starches produced in plant systems. 

Starches are commercially available from several plant sources including maize, potato and 
cassava. Each of these starches has unique physical characteristics and properties and a 
variety of possible industrial uses. In maize there are a number of naturally occurring 
mutants which have altered starch composition such as high amylopectin types ("waxy- 
starches) or high amylose starches but in potato and cassava no such mutants exist on a 
commercial basis as yet. 
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Genetic modification offers the possibility of obtaining new starches which may have 
novel and potentially useful characteristics. Most of the work to date has involved potato 
plants because they are amenable to genetic manipulation i.e. they can be transformed 
using Agrobacterium and regenerated easily from tissue culture. In addition many of the 
genes involved in starch biosynthesis have been cloned from potato and thus are available 
as targets for genetic manipulation, for example, by antisense inhibition of expression or 
sense suppression. 

Cassava (Manihot esculenta L. Crantz) is an important crop in the tropics, where its starch- 
filled roots are used both as a food source and increasingly as a source of starch. Cassava is 
a high yielding perennial crop that can grow on poor soils and is also tolerant of drought. 
Cassava starch being a root-derived starch has properties similar but not identical to potato 
starch and is composed of 20-25% amylose and 75-80% amylopectin (Rickard et al, 1991. 
Trop. Sci. 31, 189-207). Some of the genes involved in starch biosynthesis have been 
cloned from cassava, including starch branching enzyme I (SBE I) (Salehuzzaman et al, 
1994 Plant Science 98, 53-62), and granule bound starch synthase I (GBSS I) 
(Salehuzzaman et al, 1993 Plant Molecular Biology 23, 947-962) and some work has been 
done on their expression patterns although only in in vitro grown plants (Salehuzzaman et 
al, 1994 Plant Science 98, 53-62). 

In most plants studied to date e.g. maize (Boyer & Preiss, 1978 Biochem. Biophys. Res. 
Comm. 80, 169-175), rice (Smyth, 1988 Plant Sci. 57, 1-8) and pea (Smith, Planta 175, 
270-279), two forms of SBE have been identified, each encoded by a separate gene. A 
recent review by Burton et al., (1995 The Plant Journal 7, 3-15) has demonstrated that the 
two forms of SBE constitute distinct classes of the enzyme such that, in general, enzymes 
of the same class from different plants may exhibit greater similarity than enzymes of 
different classes from the same plant. In their review, Burton et al. termed the two 
respective enzyme families class "A" and class "B", and the reader is referred thereto (and 
to the references cited therein) for a detailed discussion of the distinctions between the two 
classes. One general distinction of note would appear to be the presence, in class A SBE 



molecules, of a flexible N-terminal domain, which is not found in class B molecules. The 
distinctions noted by Burton et al are relied on herein to define class A and class B SBE 
molecules, which terms are to be interpreted accordingly. 

Many organisations have interests in obtaining modified Cassava starches by means of 
genetic modification. This is impossible to achieve however, unless the plant is amenable 
to transformation and regeneration, and the starch biosynthesis genes which are to be 
targeted for modification must be cloned. The production of transgenic cassava plants has 
only recently been demonstrated (Taylor et al, 1996 Nature Biotechnology 14, 726-730; 
Schopke et al, 1996 Nature Biotechnology 14, 731-735; and Li et al, 1996 Nature 
Biotechnology 14, 736-740). The present invention concerns the identification, cloning 
and sequencing of a starch biosynthetic gene from Cassava, suitable as a target for genetic 
manipulation. 

Summary of the Invention 

In a first aspect the invention provides a nucleic acid sequence encoding a polypeptide 
having starch branching enzyme (SBE) activity, the polypeptide comprising an effective 
portion of the amino acid sequences shown in Figure 4 (SEQ. ID. NO. 29) or Figure 13 
(SEQ. ID. NO. 31). The nucleic acid is conveniently in substantial isolation, especially in 
isolation from other naturally associated nucleic acid sequences. 

An "effective portion" of the amino acid sequences may be defined as a portion which 
retains sufficient SBE activity when expressed in E. coli KV832 to complement the 
branching enzyme mutation therein. The amino acid sequences shown in Figures 4 (SEQ. 
ID. NO. 29) and 13 (SEQ. ID. NO. 31) include the N terminal transit peptide, which 
comprises about the first 50 amino acid residues. As those skilled in the art will be well 
aware, such a transit peptide is not essential for SBE activity. Thus the mature 
polypeptide, lacking a transit peptide, may be considered as one example of an effective 
portion of the amino acid sequence shown in Figure 4 (SEQ. ID. NO. 29) or Figure 13 
(SEQ. ID. NO. 31). 
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Other effective portions may be obtained by effecting minor deletions in the amino acid 
sequence, whilst substantially preserving SBE activity. Comparison with known class A 
SBE sequences, with the benefit of the disclosure herein, will enable those skilled in the art 
to identify regions of the polypeptide which are less well conserved and so amenable to 
minor deletion, or amino acid substitution (particularly, conservative amino acid 
substitution) whilst substantially preserving SBE activity. Such less well-conserved 
regions are generally found in the N terminal amino acid residues (up to the triple proline 
"elbow" at residues 138-140 in Figure 4 (SEQ. ID. NO. 29) and up to the proline elbow at 
residues 143-145 in Figure 13 (SEQ. ID. NO. 29)) and in the last 50 residues or so of the C 
terminal, and in particular in the acidic tail of the C terminal. 

^^(|\p«*nveniently the nucleic acid sequence is obtainable from cassava, preferably obtained 
g I / therefmm, and typically encodes a polypeptide obtainable from cassava. In a particular 
embodiment, the encoded polypeptide may have the amino acid sequence NSKH at about 
position 69V (in relation to Figure 4 (SEQ. ID. NO. 29)), which sequence appears peculiar 
to an isoforn\of the SBE class A enzyme of cassava, other class A SBE enzymes having 
the conserved kquence DA D/E Y (Burton et al, 1995 cited above). 

In a particular aspect of the invention there is provided a nucleic acid comprising a portion 
of nucleotides 21 to 2531 of the nucleic acid sequence shown in Figure 4 (SEQ. ID. NO. 
28), or a functionally equivalent nucleic acid sequence. Such functionally equivalent 
nucleic acid sequences include, but are not limited to, those sequences which encode 
substantially the same amino acid sequence but which differ in nucleotide sequence from 
that shown in Figure 4 (SEQ. ID. NO. 28) by virtue of the degeneracy of the genetic code. 
For example, a nucleic acid sequence may be altered (e.g. "codon optimised") for 
expression in a host other than cassava, such that the nucleotide sequence differs 
substantially whilst the amino acid sequence of the encoded polypeptide is unchanged. 
Other functionally equivalent nucleic acid sequences are those which will hybridise under 
stringent hybridisation conditions (e.g. as described by Sambrook et al, Molecular 
Cloning. A Laboratory Manual, CSH, i.e. washing with O.lxSSC, 0.5% SDS at 68°C) with 



the sequence shown in Figure 4 (SEQ. ID. NO. 28). Figure 10 shows a functionally 
equivalent sequence designated "125 + 94", which includes a region corresponding to the 
3 'coding portion of the sequence in Figure 4 (SEQ. ID. NO. 28). Figure 13 (SEQ. ID. NO. 
30) shows a functionally equivalent sequence which comprises a second complete SBE 
coding sequence (the SBE-derived sequence is from nucleotides 35 to 2760, of which the 
coding sequence is nucleotides 131-2677, the rest of the sequence in the figure is vector- 
derived). 

Functionally equivalent DNA sequences will preferably comprise at least 200-300bp, more 
preferably 300-600bp, and will exhibit at least 88% identity (more preferably at least 90%, 
and most preferably at least 95% identity) with the corresponding region of the DNA 
sequence shown in figures 4 (SEQ. ID. NO. 28) or 10. Those skilled in the art will readily 
be able to conduct a sequence alignment between the putative functionally equivalent 
sequence and those detailed in Figures 4 (SEQ. ID. NO. 28) or 10 - the identity of the two 
sequences is to be compared in those regions which are aligned by standard computer 
software, which aligns corresponding regions of the sequences. 

In particular embodiments the nucleic acid sequence may alternatively comprise a 5' and/or 
a 3' untranslated region ("UTR"), examples of which are shown in Figures 2 and 4 (SEQ. 
ID. NO. 28). Figure 9 includes a 3' UTR, as nucleotides 688-1044 and Figure 10 includes 
3' UTR as nucleotides 1507-1900 (which nucleotides correspond to the first base after the 
"stop" codon to the base immediately preceding the poly (A) tail). Any one of the 
sequences defined above, or a functional equivalent thereof (as defined by hybridisation 
properties, as set out in the preceding paragraph), could be useful in sense or anti-sense 
inhibition of corresponding genes, as will be apparent to those skilled in the art. It will 
also be apparent to those skilled in the art that such regions may be modified so as to 
optimise expression in a particular type of host cell and that the 5' and/or 3' UTRs could be 
used in isolation, or in combination with a coding portion of the sequence of the invention. 
Similarly, a coding portion could be used without a 5' or a 3' UTR if desired. 

In a further aspect, the invention provides a replicable nucleic acid construct comprising 
any one of the nucleic acid sequences defined above. The construct will typically comprise 
a selectable marker and may allow for expression of the nucleic acid sequence of the 



invention. Conveniently the vector will comprise a promoter (especially a promoter 
sequence operable in a plant and/or a promoter operable in a bacterial cell) and one or 
more regulatory signals known to those skilled in the art. 
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pother aspect the invention provides a polypeptide having SBE activity, the polypeptide 
I an effective portion of the amino acid sequence shown in Figure 4 (SEQ. ID. 
NO. 29)V Figure 13 (SEQ. ID. NO. 31). The polypeptide is conveniently one obtainable 
from cassaW although it may be derived using recombinant DNA techniques. The 
polypeptide isWeferably in substantial isolation from other polypeptides of plant origin, 
and more preferably in substantial isolation from any other polypeptides. The polypeptide 
may have amino \id residues NSKH at about position 697 (in the sequence shown in 
Figure 4 (SEQ. ID. ]W 29)), instead of the sequence DA D/E Y found in other SBE class 
A polypeptides. The polypeptide may be used in a method of modifying starch in vitro, the 
method comprising treakg starch under suitable conditions (of temperature, pH etc.) with 
an effective amount of th\ polypeptide. 

Those skilled in the art will appreciate that the disclosure of the present specification can 
be utilised in a number of ways. In particular, the characteristics of a host cell may be 
altered by recombinant DNA techniques. Thus, in a further aspect, there is provided a 
method by which a host cell may be altered by introduction of a nucleic acid sequence 
comprising at least 200bp and exhibiting at least 88% sequence identity (more preferably at 
least 90%, and most preferably at least 95% identity) with the corresponding region of the 
DNA sequence shown in Figures 4 (SEQ. ID. NO. 28), 9, 10 or 13 (SEQ. ID. NO. 31), 
operably linked in the sense or (preferably) in the anti-sense orientation to a suitable 
promoter active in the host cell, and causing transcription of the introduced nucleic acid 
sequence, said transcript and/or the translation product thereof being sufficient to interfere 
with the expression of a homologous gene naturally present in said host cell, which 
homologous gene encodes a polypeptide having SBE activity. The altered host cell is 
typically a plant cell, such as a cell of a cassava, banana, potato, sweet potato, tomato, pea, 
wheat, barley, oat, maize, or rice plant. 



Desirably the method further comprises the introduction of one or more nucleic acid 
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sequences which are effective in interfering with the expression of other homologous gene 
or genes naturally present in the host cell. Such other genes whose expression is inhibited 
may be involved in starch biosynthesis (e.g. an SBE I gene), or may be unrelated to SBE n. 

Those skilled in the art will be aware that both anti-sense inhibition, and "sense 
suppression" of expression of genes, especially plant genes, has been demonstrated (e.g. 
Matzke & Matzke 1995 Plant Physiol. 107, 679-685). 

It is believed that antisense methods are mainly operable by the production of antisense 
mRNA which hybridises to the sense mRNA, preventing its translation into functional 
polypeptide, possibly by causing the hybrid RNA to be degraded (e.g. Sheehy et al, 1988 
PNAS 85, 8805-8809; Van der Krol et al, Mol. Gen. Genet. 220, 204-212). Sense 
suppression also requires homology between the introduced sequence and the target gene, 
but the exact mechanism is unclear. It is apparent however that, in relation to both 
antisense and sense suppression, neither a full length nucleotide sequence, nor a "native- 
sequence is essential. Preferably the nucleic acid sequence used in the method will 
comprise at least 200-300bp, more preferably at least 300-600bp, of the full length 
sequence, but by simple trial and error other fragments (smaller or larger) may be found 
which are functional in altering the characteristics of the plant. It is also known that 
untranslated portions of sequence can suffice to inhibit expression of the homologous gene 
- coding portions may be present within the introduced sequence, but they do not appear to 
be essential under all circumstances. 



The inventors have discovered that there are at least two class A SBE genes in cassava. A 
fragment of a second gene has been isolated, which fragment directs the expression of the 
C terminal 481 amino acids of cassava class A SBE (see Figure 10) and comprises a 3' 
untranslated region. Subsequently, a complete clone of the second gene was also 
recovered (see Figure 12). The coding portions of the two genes show some slight 
differences, and the second SBE gene may be considered as functionally equivalent to the 
corresponding portion of the nucleotide sequence shown in Figure 4 (SEQ. ID. NO. 28). 
However, the 3' untranslated regions of the two genes show marked differences. Thus the 
method of altering a host cell may comprise the use of a sufficient portion of either gene so 
as to inhibit the expression of the naturally occurring homologous gene. Conveniently, a 
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portion of nucleotide sequence is employed which is conserved between both genes. 
Alternatively, sufficient portions of both genes may be employed, typically using a single 
construct to direct the transcription of both introduced sequences. 

In addition, as explained above, it may be desired to cause inhibition of expression of the 
class B SBE (i.e. SBE I) in the same host cell. A number of class B SBE gene sequences 
are known, including portions of the cassava class B SBE (Salehuzzaman et aL, 1994 Plant 
Science 98, 53-62) and any one of these may prove suitable. Preferably the sequence used 
is that which derives from the host cell sought to be altered (e.g. when altering the 
characteristics of a cassava plant cell, it is generally preferred to use sense or anti-sense 
sequences corresponding exactly to at least portions of the cassava gene whose expression 
is sought to be inhibited). 

In a further aspect the invention provides an altered host cell, into which has been 
introduced a nucleic acid sequence comprising at least 200bp and exhibiting at least 88% 
sequence identity (more preferably at least 90%, and most preferably at least 95% identity) 
with the corresponding region of the DNA sequence shown in Figures 4 (SEQ. ID. NO. 
28), 9, 10 or 13 (SEQ. H). NO. 31), operably linked in the sense or anti-sense orientation to 
a suitable promoter, said host cell comprising a natural gene sharing sequence homology 
with the introduced sequence. 

The host cell may be a micro-organism (such as a bacterial, fungal or yeast cell) or a plant 
cell. Conveniently the host cell altered by the method is a cell of a cassava plant, or 
another plant with starch storage reserves, such as banana, potato, sweet potato, tomato, 
pea, wheat, barley, oat, maize, or rice plant. Typically the sequence will be introduced in a 
nucleic acid construct, by way of transformation, transduction, micro-injection or other 
method known to those skilled in the art. The invention also provides for a plant into 
which has been introduced a nucleic acid sequence of the invention, or the progeny of such 
a plant. 

The altered plant cell will preferably be grown into an altered plant, using techniques of 
plant growth and cultivation well-known to those skilled in the art of re-generating 
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plantlets from plant cells. 

The invention also provides a method of obtaining starch from an altered plant, the plant 
being obtained by the method defined above. Starch may be extracted from the plant by 
any of the known techniques (e.g. milling). The invention further provides starch 
obtainable from a plant altered by the method defined above, the starch having altered 
properties compared to starch extracted from an equivalent but unaltered plant. 
Conveniently the altered starch is obtained from an altered plant selected from the group 
consisting of cassava, potato, pea, tomato, maize, wheat, barley, oat, sweet potato and rice. 
Typically the altered starch will have increased amylose content. 

The invention will now be further described by way of illustrative examples and with 
reference to the accompanying drawings, in which:- 

Figure 1 is a schematic illustration of the cloning strategy for cassava SBE II. The top line 
represents the size of a full length clone with distances in kilobases (kb) and arrows 
representing oligonucleotides (rightward pointing arrows are sense strand, leftward are on 
opposite strand). The long thick arrow is the open reading frame with start and stop 
codons shown. Below this are shown the 3' RACE, 5' RACE and PCR clones identified 
either by the plasmid name (shown in brackets above the line) or the clone number (shown 
to the left of the clone) for the 5' RACE only. Also shown (by an x) in the 5' RACE clones 
are positions of small deletions or introns. 

Figure 2 shows the DNA sequence and predicted ORF of csbe2con.seq. This sequence is a 
consensus of 3' RACE pSJ94 and 5' RACE clones 27/9,1 1 and 28. The first 64 base pairs 
are derived from the RoRidT17 adaptor primer/dT tail followed by the SBE sequence. The 
one long open reading frame is shown in one letter code below the double strand DNA 
sequence. Also shown is the upstream ORF (MQL...LPW). 

Figure 3 shows an alignment of the 5' region of cassava SBE II csbe2con and pSJ99 
(clones 20 and 35) DNA sequences. Differences from the consensus sequence are shaded. 
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Figure 4 shows the DNA sequence and predicted ORF of full length cassava SBE II tuber 
cDNA in pSJ107. The sequence shown is from the CSBE214 (SEQ. ID. NO. 15) to the 
CSBE218 (SEQ. ED. NO. 19) oligonucleotide. The DNA sequence is sequence ID No. 28 
in the attached sequence listing; the amino acid sequence is Seq ID No. 29. 

Figure 5 shows an alignment of 3' region of cassava SBE II pSJ116 and 125+94 DNA 
sequences. The top line is the 125 + 94 sequence and the bottom SJ116 sequence. 
Identical nucleotides are indicated by the same letter in the middle line, differences are 
indicated by a gap, and dashed lines indicate gaps introduced to optimise alignment. 

Figure 6 shows an alignment of carboxy terminal region of pSJ116 and 125+94 protein 
sequences. The top sequence is from 125+94 and the bottom from pSJ116. Identical 
amino acid residues are shown with the same letter, conserved changes with a colon and 
neutral changes with a period. 

Figure 7 shows a phylogenetic tree of starch branching enzyme proteins. The length of 
each pair of branches represents the distance between sequence pairs. The scale beneath the 
tree measures the distance between sequences (units indicate the number of substitution 
events). Dotted lines indicate a negative branch length because of averaging the tree. 
Zmconl2.pro is maize SBE n, psstbl.pro is pea SBE I (Bhattacharyya et al 1990 Cell 60, 
115-121) and atsbe2-l & 2-2.pro are two SBE II proteins from Arabidopsis thalania 
(Fisher et al 1996 Plant Mol. Biol. 30, 97-108). SJ107.pro is representative of a cassava 
SBE II sequence, and potsbe2.pro is a potato SBE II sequence known to the inventors. 

Figure 8 is an alignment of SBE II proteins. Protein sequences are indicated in one letter 
code. The top line represents the consensus sequence, below which is shown the 
consensus ruler and the individual SBE II sequences. Residues matching the consensus are 
shaded. Dashes represent gaps introduced to optimise alignment. Sequence identities are 
shown at the right of the figure and are as Figure 7, except that SJ107.pro is cassava SBE 

n. 
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Figure 9 shows the DNA sequence and predicted ORF of a cassava SBE II cDNA isolated 
by 3' RACE (plasmid pSJ 101). 

Figure 10 shows the consensus DNA sequence and predicted ORF of a second cassava 
SBE II cDNA isolated by 3' and 5' RACE (sequence designated 125+94 is from plasmid 
pSJ125 and pSJ94, spliced at the CSBE217, SEQ. ED. NO. 18, oligo sequence). 

Figure 11 is a schematic diagram of the plant transformation vector pSJ64. The black line 
represents the DNA sequence. The hashed line represents the bacterial plasmid backbone 
(containing the origin of replication and bacterial selection marker) and is not shown in 
full. The filled triangles represent the T-DNA borders (RB = right border, LB = left 
border). Relevant restriction enzyme sites are shown above the black line with the 
approximate distances (in kiloobases) betwen sites marked by an asterisk shown 
underneath. The thinnest arrows represent polyadenylation signals (pAnos = nopaline 
synthase, pAg7 = Agrobacterium gene 7), the intermediate arrows represent protein coding 
regions (SBE II = cassava SBE K, HYG = hygromycin resistance gene) and the thick 
arrows represent promoter regions (P-2x35S = double CaMV 35S promoter, P-nos = 
nopaline synthase promoter). 

Figure 12 is a schematic illustration of the cloning strategy used to isolate a second cassava 
SBE II gene. The top line represents the size of a full length clone with distances in 
kilobases (kb) and arrows representing oligonucleotides (rightward pointing arrows are 
sense strand, leftward are on opposite strand). The long thick arrow is the open reading 
frame with start and stop codons shown. Below this are shown the 3 'RACE, 5 'RACE and 
PCR clones identified either by the plasmid name (shown in brackets above the line) or the 
clone number (shown to the right of the clone). 

Figure 13 shows the DNA sequence and predicted ORF of a second full length cassava 
SBE H tuber cDNA in pSJ146. Nucleotides 35-2760 are SBE H sequence and the 
remainder are from the pT7Blue vector. The DNA sequence of Figure 13 is Seq ID No. 
30, and the amino acid sequence is Seq ID No. 31, in the attached sequence listing. 
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Example 1 

This example relates to the isolation and cloning of SBE II sequences from cassava. 
Recombinant DNA manipulations 

Standard procedures were performed essentially according to Sambrook et al (1989 
Molecular cloning A laboratory manual, 2nd edn. Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, N.Y.). DNA sequencing was performed on an ABI automated DNA 
sequencer and sequences manipulated using DNASTAR software for the Macintosh. 

Rapid Amplification of cDNA ends (RACE) and PCR conditions 

5' and 3' RACE were performed essentially according to Frohman et al, (1988 Proc. Natl. 
Acad. Sci. USA 85, 8998-9002) but with the following modifications. 

For 3' RACE, 5 jag of total RNA was reverse transcribed using 5 pmol of the RACE 
adaptor RoRidT17 as primer and Stratascript RNAse H- reverse transcriptase (50 U) in a 
50 jllI reaction according to the manufacturer's instructions (Stratagene). The reaction was 
incubated for 1 hour at 37°C and then diluted to 200 ^il with TE (10 mM Tris HC1, 1 mM 
EDTA) pH 8 and stored at 4°C. 2.5 |il of this cDNA was used in a 25 \i\ PCR reaction 
with 12.5 pmol of SBE A (SEP. ID. NO. V) and Ro primers for 30 cycles of 94°C 45 sec, 
50°C 25 sec, 72°C 1 min 30 sec. A second round of PCR (25 cycles) was performed using 
1 ill of this reaction as template in a 50 jllI reaction under the same conditions. Amplified 
products were separated by agarose gel electrophoresis and cloned into the pT7Blue vector 
(Invitrogen). 

For the first round of 5' RACE, 5 fxg of total leaf RNA was reverse transcribed as described 
above using 10 pmol of the SBE H gene specific primer CSBE22 (SEQ. ID. NO 3). This 
primer was removed from the reaction by diluting to 500 jLtl with TE and centrifuging twice 
through a centricon 100 microconcentrator. The concentrated cDNA was then dA-tailed 
with 9U of terminal deoxynucleotide transferase and 50 dATP in a 20 jil reaction in 
buffer supplied by the manufacturer (BRL). The reaction was incubated for 10 min at 
37°C and 5 min at 65°C and then diluted to 200 ^1 with TE pH 8. PCR was performed in a 
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50 ul volume using 5u,l of tailed cDNA, 2.5 pmol of RoRidT17 and 25 pmol of Ro and 
CSBE24 (SEQ. ID. NO. 5) primers for 30 cycles of 94°C 45 sec, 55°C 25 sec, 72°C 3 min. 
Amplified products were separated on a 1% TAE agarose gel, cut out, 200|xl of TE was 
added and melted at 99°C for 10 min. Five uJ of this was re-amplified in a 50 \i\ volume 
using CSBE25 (SEQ. ID. NO. 6) and Ri as primers and 25 cycles of 94°C 45 sec, 55°C 25 
sec, 72°C 1 min 30 sec. Amplified fragments were separated on a 1% TAE agarose gel, 
purified on DEAE paper and cloned into pT7Blue. 

The second round of 5' RACE was performed using CSBE28 (SEQ. ID. NO. 9) and 29 
primers in the first and second round PCR reactions respectively using a new A-tailed 
cDNA library primed with CSBE27 (SEQ. ID. NO. 8). 

A third round of 5' RACE was performed on the same CSBE27 (SEQ. ID. NO. 8) primed 
cDNA. 

Repeat 3' RACE and PCR Cloning 

The 3' RACE library (RoRidTH primed leaf RNA) was used as a template. The first PCR 
reaction was diluted 1:20 and 1 uJ was used in a 50 ul PCR reaction with SBE A (SEQ. ID. 
NO. 1) and Ri primers and the products were cloned into pT7Blue. The cloned PCR 
products were screened for the presence or absence of the CSBE23 (SEQ. ID. NO. 4) oligo 
by colony PCR. 

A full length cDNA of cassava SBE II was isolated by PCR from leaf or root cDNA 
(RoRidTH primed) using primers CSBE214 (SEQ. ID. NO. 15) and CSBE218 (SEQ. ID. 
NO. 19) from 2.5 ul of cDNA in a 25 ul reaction and 30 cycles of 94°C 45 sec, 55°C 25 
sec, 72°C 2 min. 

Complementation of E. coli mutant KV832 

SBE II containing plasmids were transformed into the branching enzyme deficient mutant 
E. coli KV832 (Keil et al, 1987 Mol. Gen. Genet. 207, 294-301) and cells grown on solid 
PYG media (0.85 % KH 2 P0 4 , 1.1 % K 2 HP0 4 , 0.6 % yeast extract) containing 1.0 % 
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glucose. To test for complementation, a loop of cells was scraped off and resuspended in 
150 jiL water to which was added 15 \iL of Lugol's solution (2 g KI and 1 g I 2 per 300 ml 
water). 



RNA isolation 

RNA was isolated from cassava plants by the method of Logemann (1987 Anal. Biochem. 
163, 21-26). Leaf RNA was isolated from 0.5 gm of in vitro grown plant tissue. The total 
yield was 300 |Lig. Three month old roots (88 gm) were used for isolation of root RNA). 



SBE II specific oligonucleotides 



bBfc A 


ATGGACAAGGATATGTATGA 


(Seq ID No. 


1) 


CSBE21 


GGTTTCATGACTTCTGAGCA 


(Seq ID No. 


2) 


CSBE22 


TGCTCAGAAGTCATGAAACC 


(Seq ID No. 


3) 


CSBE23 


TCCAGTCTCAATATACGTCG 


(Seq ID No. 


4) 


CSBE24 


AGGAGTAGATGGTCTGTCGA 


(Seq ID No. 


5) 


CSBE25 


TCATACATATCCTTGTCCAT 


(Seq ID No. 


6) 


CSBE26 


GGGTGACTTCAATGATGTAC 


(Seq ID No. 


7) 


CSBE27 


GGTGTACATCATTGAAGTCA 


(Seq ID No. 


8) 


CSBE28 


AATTACTGGCTCCGTACTAC 


(Seq ID No. 


9) 


CSBE29 


CATTCCAACGTGCGACTCAT 


(Seq ID No. 


10) 


CSBE210 


TACCGGTAATCTAGGTGTTG 


(Seq ID No. 


11) 


CSBE211 


GGACCTTGGTTTAGATCCAA 


(Seq ID No. 


12) 


CSBE212 


ATGAGTCGCACGTTGGAATG 


(Seq ID No. 


13) 


CSBE213 


CAACACCTAGATTACCGGTA 


(Seq ID No. 


14) 


CSBE214 


TTAGTTGCGTCAGTTCTCAC 


(Seq ID No. 


15) 


CSBE215 


AATATCTATCTCAGCCGGAG 


(Seq ID No. 


16) 


CSBE216 


ATCTTAGATAGTCTGCATCA 


(Seq ID No. 


17) 


CSBE217 


TGGTTGTTCCCTGGAATTAC 


(Seq ID No. 


18) 


CSBE218 


TGCAAGGACCGTGACATCAA 


(Seq ID No. 


19) 



RESULTS 
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Cloning of a SBE II gene from cassava leaf 

The strategy for cloning a full length cDNA of starch branching enzyme II of cassava is 
shown in Figure 1. A comparison of several SBE II (class A) SBE DNA sequences 
identified a 23 bp region which appears to be completely conserved among most genes 
(data not shown) and is positioned about one kilobase upstream from the 3' end of the 
gene. An oligonucleotide primer (designated SBE A, SEQ. ID. NO. 1) was made to this 
sequence and used to isolate a partial cDNA clone by 3' RACE PCR from first strand leaf 
cDNA as illustrated in Figure 1. An approximately 1100 bp band was amplified, cloned 
into pT7Blue vector and sequenced. This clone was designated pSJ94 and contained a 
1120 bp insert starting with the SBE A (SEQ. ID. NO. 1) oligo and ending with a polyA 
tail. There was a predicted open reading frame of 235 amino acids which was highly 
homologous (79% identical) to a potato SBE II also isolated by the inventors (data not 
shown) suggesting that this clone represented a class A (SBE II) gene. 
To obtain the sequence of a full length clone nested primers were made complementary to 
the 5' end of this sequence and used in 5' RACE PCR to isolate clones from the 5' region of 
the gene. A total of three rounds of 5' RACE was needed to determine the sequence of the 
complete gene (i.e. one that has a predicted long ORF preceded by stop codons). It should 
be noted that during this cloning process several clones (# 23, 9, 16) were obtained that had 
small deletions and in one case (clone 23) there was also a small (120 bp) intron present. 
These occurrences are not uncommon and probably arise through errors in the PCR process 
and/or reverse transcription of incompletely processed RNA (heterogeneous nuclear RNA). 

The overlapping cDNA fragments could be assembled into a contiguous 3 kb sequence 
(designated csbe2con.seq) which contained one long predicted ORF as shown in Figure 2. 
Several clones in the last round of 5' RACE were obtained which included sequence of the 
untranslated leader (UTL). All of these clones had an ORF (42 amino acids) 46 bp 
upstream and out of frame with that of the long ORF. 

There is more than one SBE II gene in cassava 

In order to determine if the assembled sequence represented that of a single gene, attempts 
were made to recover by PCR a full length SBE II gene using primers CSBE214 (SEQ. ID. 
NO. 15) and CSBE23 (SEQ. ID. NO. 4) at the 5' and 3' ends of the csbe2con sequence 



respectively. All attempts were unsuccessful using either leaf or root cDNA as template. 
The PCR was therefore repeated with either the 5 - or 3 - most primer and complementary 
primers along the length of the SBE II gene to determine the size of the largest fragment 
that could be amplified. With the CSBE214 (SEQ. ID. NO. 15) primer, fragments could be 
amplified using primers 210, 28, 27 and 22 in order of increasing distance, the latter primer 
pair amplifying a 2.2 kb band. With the 3' primer CSBE23 (SEQ. ID. NO. 4), only primer 
pairs with 21 and 26 gave amplification products, the latter being about 1200 bp. These 
results suggest that the original 3' RACE clone (pSJ94) is derived from a different SBE H 
gene than the rest of the 5' RACE clones even though the two largest PCR fragments 
(214+22 and 26+23) overlap by 750 bp and share several primer sites. It is likely that the 
sequence of the two genes starts to diverge around the CSBE22 (SEQ. ID. NO 3) primer 
site such that the 3' end of the corresponding gene does not contain the 23 primer and is not 
therefore able to amplify a cDNA when used with the 214 primer. 

To confirm this, the sequence of the longest 5' PCR fragment (214+22) from two clones 
(#20 designated pSJ99, & #35) was determined and compared to the consensus sequence 
csbe2con as shown in Figure 3. The first 2000 bases are nearly identical (the single base 
changes might well be PCR errors), however the consensus sequence is significantly 
different after this. This region corresponds to the original 3' RACE fragment pSJ94 (SBE 
A + Ri adaptor) and provided evidence that there may be more than one SBE II gene in 
cassava. 

The 3' end corresponding to pSJ99 was therefore cloned as follows: 3' RACE PCR was 
performed on leaf cDNA using the SBE A oligo as the gene specific primer so that all SBE 
II genes would be amplified. The cloned DNA fragments were then screened for the 
presence or absence of the CSBE23 (SEQ. ID. NO. 4) primer by PCR. Two out of 15 
clones were positive with the SBE A + Ri primer pair but negative with SBE A + CSBE23 
(SEQ. ID. NO. 4) primers. The sequence of these two clones (designated pSJIOl, as 
shown in Figure 9) demonstrated that they were indeed from an SBE II gene and that they 
were different from pSJ94. However the overlapping region of pSJIOl (the 3' clone) and 
pSJ99 (the 5' clone) was identical suggesting that they were derived from the same gene. 



confirm this a primer (CSBE218, SEQ. ID. NO. 19) was made to a region in the 3' UTR 
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(^translated region) of pSJIOl and used in combination with CSBE214 (SEQ. ID. NO. 
15) primer to recover by PCR a full length cDNA from both leaf and root cDNA. These 
clones \ere sequenced and designated pSJ106 & pSJ107 respectively. The sequence and 
predicted ORF of pSJ107 is shown in Figure 4 (SEQ. ID. NO. 28). The long ORF in 
plasmid pSJlO^vas found to be interrupted by a stop codon (presumably introduced in the 
PCR process) approximately 1 kb from the 3' end of the gene, therefore another cDNA 
clone (designated pS016) was amplified in a separate reaction, cloned and sequenced. 
This clone had an intacrORF (data not shown). 

There were only a few differences in these two sequences (in the transit peptide aa 27-41: 
YRRTS S CLSFNFKE A to DRRTSSCLSFIFKKAA and L831 in pSJ107 to V in pSJ116 



respectively). * 

An additional 740bp of sequence of the gene corresponding to the pSJ94 clone was 
isolated by 5' RACE using the primers CSBE216 (SEQ. ID. NO. 17) and 217, and was 
designated pSJ125. This sequence was combined with that of pSJ94 to form a consensus 
sequence "125 + 94", as shown in Figure 10. The sequence of this second gene is about 
90% identical at the DNA and protein level to pSJ116, as shown in Figure 5 and 6, and is 
clearly a second form of SBE H in cassava. The 3' untranslated regions of the two genes 
are not related (data not shown). 

It was also determined that the full length cassava SBE II genes (from both leaf and tuber) 
actually encode for active starch branching enzymes since the cloned genes were able to 
complement the glycogen branching enzyme deficient E. coli mutant KV832. 

Main Findings 

1) A full length cDNA clone of a starch branching enzyme II (SBE II) gene has been 
cloned from leaves and starch storing roots of cassava. This cDNA encodes a 836 amino 
acid protein (Mr 95 Kd) and is 86 % identical to pea SBE I over the central conserved 
domain, although the level of sequence identity over the entire coding region is lower than 
86%. 



2) There is more than one SBE II gene in cassava as a second partial SBE II cDNA was 
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isolated which differs slightly in the protein coding region from the first gene and has no 
homology in the 3' untranslated region. 

3) The isolated full length cDNA from both leaves and roots encodes an active SBE as it 
complements an E. coli mutant deficient in glycogen branching enzyme as assayed by 
iodine staining. 

We have shown that there are SBE H (Class A) gene sequences present in the cassava 
genome by isolating cDNA fragments using 3' and 5' RACE. From these cDNA fragments 
a consensus sequence of over 3 kb could be compiled which contained one long open 
reading frame (Figure 2) which is highly homologous to other SBE H (class A) genes (data 
not shown). It is likely that the consensus sequence does not represent that of a single gene 
since attempts to PCR a full length gene using primers at the 5' and 3' ends of this sequence 
were not successful. In fact screening of a number of leaf derived 3' RACE cDNAs showed 
that a second SBE n gene (clone designated pSJIOl) was also expressed which is highly 
homologous within the coding region to the originally isolated cDNA (pSJ94) but has a 
different 3' UTR. A full length SBE H gene was isolated from leaves and roots by PCR 
using a new primer to the 3' end of this sequence and the original sequence at the 5' end of 
the consensus sequence. If the frequency of clones isolated by 3' RACE PCR reflects the 
abundance of the mRNA levels then this full length gene may be expressed at lower levels 
in the leaf than the pSJ94 clone (2 out of 15 were the former class, 13/15 the latter). It 
should be noted that each class is expressed in both leaves and roots as judged by PCR 
(data not shown). Sequence analysis of the predicted ORF of the leaf and root genes 
showed only a few differences (4 amino acid changes and one deletion) which could have 
arisen through PCR errors or, alternatively, there may be more than one nearly identical 
gene expressed in these tissues. 

5Homparison of all known SBE II protein sequences shows that the cassava SBE II gene is 
most cWly related to the pea gene (Figure 8). The two proteins are 86.3% identical over 
a 686 amino acid range which extends from the triple proline "elbow" (Burton et al, 1995 
Plant J. 7, 3^15) to the conserved VVYA sequence immediately preceding the C-terminal 
extensions (data not shown). All SBE H proteins are conserved over this range in that they 
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re at least 80% similar to each other. Remarkably however, the sequence conservation 
between the pea, potato and cassava SBE E proteins also extends to the N-terminal transit 
peptideN^specially the first 12 amino acids of the precursor protein and the region 
surrounding^ mature terminus of the pea protein (AKFSRDS). Because the proteins are 
so similar arou^this region it can be predicted that the mature terminus of the cassava 
SBE H protein is liVto be GKSSHES. The precursor has a predicted molecular mass of 
96 kD and the mature pWin a predicted molecule mass of 91.3 kD. The cassava SBE E 
has a short acidic tail at the C-terminal although this is not as long or as acidic as that 
found in the pea or potato proteins. The significance of this acidic tail, if any, remains to 
be determined. One notable difference between the amino acid sequence of cassava SBE E 
and all other SBE E proteins is the presence of the sequence NSKH at around position 697 
instead of the conserved sequence DArW. Although this conserved region forms part of 
a predicted a-helix (number 8) of the caWic (B/cc) 8 barrel domain (Burton et al 1995 
cited previously), this difference does not abo^sh the SBE activity of the cassava protein as 
this gene can still complement the glycogen branching deletion mutant of E. coll It may 
however affect the specificity of the protein. AnWesting point is that the other cassava 
SBE II clone pS J94 has the conserved sequence Da\y. 

One other point of interest concerning the sequence of the SBE E gene is the presence of 
an upstream ATG in the 5' UTR. This ATG could initiate a small peptide of 42 amino 
acids which would terminate downstream of the predicted initiating methionine codon of 
the SBE H precursor. If this does occur then the translation of the SBE E protein from this 
mRNA is likely to be inefficient as ribosomes normally initiate at the 5' most ATG in the 
mRNA. However the first ATG is in a poorer Kozak context than the SBE E initiator and 
it may be too close to the 5' end of the message to initiate efficiently (14 nucleotides) thus 
allowing initiation to occur at the correct ATG. 



In conclusion we have shown that cassava does have SBE E gene sequences, that they 
expressed in both leaves and tubers and that more than one gene exists. 



are 



Example 2 
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Cloning of a second full length cassava SBE II gene 



Methods 






Oligonucleotides 




CSBE219 


CTTTATCTATT A AAGAfTTr 


(Seq ID No. 20) 


CSBE220 


CAAAAAAGTTTGTGACATGG 


(Seq ID No. 21) 


CSBE221 


TCACTTTTTCCAATGCTAAT 


(Seq ID No. 22) 


CSBE222 


TCTCATGCAATGGAACCGAC 


(Seq ID No. 23) 


CSBE223 


CAGATGTCCTGACTCGGAAT 


(Seq ID No. 24) 


CSBE224 


ATTCCGAGTCAGGACATCTG 


(Seq ID No. 25) 


CSBE225 


CGCATTTCTCGCTATTGCTT 


(Seq ID No. 26) 


CSBE226 


CACAGGCCCAAGTGAAGAAT 


(Seq ID No. 27) 



The 5' end of the gene corresponding to the 3 RACE clone pSJ94 was isolated in three 
rounds of 5RACE. Prior to performing the first round of 5' RACE, 5 ng of total leaf RNA 
was reverse transcribed in a 20 ul reaction using conditions as decribed by the 
manufacturer (Superscript enzyme, BRL) and 10 pmol of the SBE H gene specific primer 
CSBE23 (SEQ. ID. NO. 4). Primers were then removed and the cDNA tailed with dATP 
as described above. The first round of 5 RACE used primers CSBE216 (SEQ. ID. NO. 17) 
and Ro. This PCR reaction was diluted 1:20 and used as a template for a second round of 
amplification using primers CSBE217 (SEQ. ID. NO. 18) and Ri. The gene specific 
primers were designed so that they would preferentially hybridise to the SBE H sequence in 
pSJ94. Amplified products appeared as a smear of approximately 600-1200 bp when 
subjected to electrophoresis on a 1% TAE agarose gel. 

This smear was excised and DNA purified using a Qiaquick column (Qiagen) before 
ligation to the pT7Blue vector. Several clones were sequenced and clone #7 was 
designated pSJ125. New primers (CSBE219 (SEQ. ID. NO. 20) and 220 (SEQ. ID. NO. 
21)) were designed to hybridise to the 5' end of pSJ125 and a second round of 5 RACE was 
performed using the same CSBE23 (SEQ. ID. NO. 4) primed library. Two fragments of 
600 and 800 bp were cloned and sequenced (clones 13,17). Primers CSBE221 (SEQ. ID. 
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NO. 22) and 222 (SEQ. ID. NO. 23) were designed to hybridise to the 5' sequence of the 
longest clone (#13) and a third round of 5' RACE was performed on a new library (5 u,g 
total leaf RNA reverse transcribed with Superscript using CSBE220 (SEQ. ID. NO. 21) as 
primer and then dATP tailed with TdT from Boehringer Mannheim). Fragments of 
approximately 500 bp were amplified, cloned and sequenced. Clone #13, was designated 
pSJ143. The process is illustrated schematically in Figure 12. 

To isolate a full length gene as a contiguous sequence, a new primer (CSBE225, SEQ. ID. 
NO. 26) was designed to hybridise to the 5' end of clone pSJ143 and used with one of the 
primers (CSBE226 (SEQ. ID. NO. 27) or 23 (SEQ. ID. NO. 4)) in the 3' end of clone 
pSJ94, in a PCR reaction using RoRidT17 primed leaf cDNA as template. Use of primer 
CSBE226 (SEQ. ID. NO. 27) resulted in production of Clone #2 (designated pSJ144), and 
use of primer CSBE23 (SEQ. ID. NO. 4) resulted in production of Clones #10 and 13 
(designated pSJ145 and pSJ146 respectively). Only pSJ146 was sequenced fully. 

Results 

Isolation of a second full length cassava SBE II gene 

A full length clone for a second SBE H gene was isolated by extending the sequence of 
pSJ94 in three rounds of 5' RACE as illustrated schematically in Figure 12. In each round 
of 5' RACE, primers were designed that would preferentially hybridise to the new sequence 
rather than to the gene represented by pSJ116. In the final round of 5' RACE, three clones 
were obtained that had the initiating methione codon, and none of these had upstream 
ATGs. The overlapping cDNA fragments (sequences of the 5 RACE clones pSJ143, 13, 
pSJ125 and the 3 RACE clone pSJ94) could be assembled into a consensus sequence of 
approximately 3 kb which was designated csbe2-2.seq. This sequence contained one long 
ORF with a predicted size of 848 aa (M r 97 kDa). The full length gene was then isolated 
as a contiguous sequence by PCR amplification from RoRidT17 primed leaf cDNA using 
primers at the 5' (CSBE225, SEQ. ID. NO. 26) and 3' (CSBE23 (SEQ. ID. NO. 4) or 
CSBE226 (SEQ. ID. NO. 27)) ends of the RACE clones. One clone, designated pSJ146, 
was sequenced and the restriction map is shown along with the predicted amino acid 
sequence in Figure 13 (SEQ. ID. NO. 31). 
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Sequence homologies between SBE II genes 

The two cassava genes (pSJ116 and pSJ146) share 88.8% identity at the DNA level over 
the entire coding region (data not shown). The homology extends about 50 bases outside 
of this region but beyond this the untranslated regions show no similarity (data not shown). 
At the protein level the two genes show 86% identity over the entire ORF (data not 
shown). The two genes are more closely related to each other than to any other SBE H 
Between species, the pea SBE I shows the most homology to the cassava SBE II genes. 

Example 3 

Construc tion of plant transformation vectors and transformation of cassava with 
antisense starch branching enzyme genes. 

This example describes in detail how a portion of the SBE II gene isolated from cassava 
may be introduced into cassava plants to create transgenic plants with altered properties. 

An 1100 bp Hind JE - Sac I fragment of cassava SBE II (from plasmid pSJ94) was cloned 
into the Hind EH - Sac I sites of the plant transformation vector pSJ64 (Figure 11). This 
placed the SBE II gene in an antisense orientation between the 2X 35S CaMV promoter 
and the nopaline synthase polyadenylation signal. pSJ64 is a derivative of the binary 
vector pGPTV-HYG (Becker et al, 1992 Plant Molecular Biology 20: 1195-1197) 
modified by inclusion of an approximately 750 bp fragment of pJIT60 (Guerineau et al 
1992 Plant Mol. Biol. 18, 815-818) containing the duplicated cauliflower mosaic virus 
(CaMV) 35S promoter (Cabb-JI strain, equivalent to nucleotides 7040 to 7376 duplicated 
upstream of 7040 to 7433, as described by Frank et al, 1980 Cell 21, 285-294) to replace 
the GUS coding sequence. A similar construct was made with the cassava SBE II 
sequence from plasmid pSJIOl. 

These plasmids are then introduced into Agrobacterium tumefaciens LBA4404 by a direct 
DNA uptake method (An et al, Binary vectors, In: Plant Molecular Biology Manual (ed 
Galvin and Schilperoort) AD 1988 pp 1-19) and can be used to transform cassava somatic 
embryos by selecting on hygromycin as described by Li et al. (1996, Nature Biotechnology 
14, 736-740). 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: 

(A) NAME: National Starch and Chemical Investment 

Holding Corporation 

(B) STREET: Suite 27, 501 Silverside Road 

(C) CITY: Wilmington 

(D) STATE: Delaware 

(E) COUNTRY: USA 

(F) POSTAL CODE (ZIP) : 19809 



(ii) TITLE OF INVENTION: Improvements in or Relating to 
Starch Content of 

Plants 



(iii) NUMBER OF SEQUENCES: 31 



(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1 30 

(EPO) 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) ' TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1: 

ATGGACAAGG ATATGTATGA 
20 



( 2 ) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2: 
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GGTTTCATGA CTTCTGAGCA 
20 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 

TGCTCAGAAG TCATGAAACC 
20 



(2) INFORMATION FOR SEQ ID NO : 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 

TCCAGTCTCA ATATACGTCG 
20 



(2) INFORMATION FOR SEQ ID NO : 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5 

AGGAGTAGAT GGTCTGTCGA 
20 



(2) INFORMATION FOR SEQ ID NO : 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6 

TCATACATAT CCTTGTCCAT 
20 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7 

GGGTGACTTC AATGATGTAC 
20 



(2) INFORMATION FOR SEQ ID NO : 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8: 

GGTGTACATC ATTGAAGTCA 
20 



(2) INFORMATION FOR SEQ ID NO : 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9: 

AATTACTGGC TCCGTACTAC 
20 



(2) INFORMATION FOR SEQ ID NO : 10: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 10 

CATTCCAACG TGCGACTCAT 
20 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 

TAC CGGTAAT CTAGGTGTTG 
20 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

GGACCTTGGT TTAGATCCAA 
20 



(2) INFORMATION FOR SEQ ID NO : 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

ATGAGTCGCA CGTTGGAATG 
20 
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(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 

CAACACCTAG ATTACCGGTA 
20 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 15 

TTAGTTGCGT CAGTTCTCAC 
20 



(2) INFORMATION FOR SEQ ID NO : 16: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

AATATCTATC TCAGCCGGAG 
20 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 17: 
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ATCTTAGATA GTCTGCATCA 
20 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 

TGGTTGTTCC CTGGAATTAC 
20 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 

TGCAAGGACC GTGACATCAA 
20 



(2) INFORMATION FOR SEQ ID NO : 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 20: 

CTTTATCTAT TAAAGACTTC 
20 



(2) INFORMATION FOR SEQ ID NO : 21: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 21: 

CAAAAAAGTT TGTGACATGG 
20 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

TCACTTTTTC CAATGCTAAT 
20 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

TCTCATGCAA TGGAACCGAC 
20 



(2) INFORMATION FOR SEQ ID NO : 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

CAGATGTCCT GACTCGGAAT 
20 



(2) INFORMATION FOR SEQ ID NO : 25: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 5 

ATTCCGAGTC AGGACATCTG 
20 



(2) INFORMATION FOR SEQ ID NO : 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 26 

CGCATTTCTC GCTATTGCTT 
20 



(2) INFORMATION FOR SEQ ID NO : 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

CACAGGCCCA AGTGAAGAAT 
20 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2588 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 21 . .2531 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

CTCTCTAACT TCTCAGCGAA ATG GGA CAC TAC ACC ATA TCA GGA ATA 
CGT 50 

Met Gly His Tyr Thr lie Ser Gly lie 

Arg 

1 5 

10 

TTT CCT TGT GCT CCA CTC TGC AAA TCT CAA TCT ACC GGC TTC CAT 
GGC 98 

Phe Pro Cys Ala Pro Leu Cys Lys Ser Gin Ser Thr Gly Phe His 
Gly 

15 20 25 



TAT CGG AGG ACC TCC TCT TGC CTT TCC TTC AAC TTC AAG GAG GCG 
TTT 146 

Tyr Arg Arg Thr Ser Ser Cys Leu Ser Phe Asn Phe Lys Glu Ala 
Phe 

30 35 40 



TCT AGG AGG GTC TTC TCT GGA AAG TCA TCT CAT GAA TCT GAC TCC 
TCA 194 

Ser Arg Arg Val Phe Ser Gly Lys Ser Ser His Glu Ser Asp Ser 
Ser 

45 50 55 



AAT GTA ATG GTC ACT GCT TCT AAA AGA GTC CTT CCT GAT GGT CGG 
ATT 242 

Asn Val Met Val Thr Ala Ser Lys Arg Val Leu Pro Asp Gly Arg 
lie 

60 65 70 



GAA TGC TAT TCT TCT TCA ACA GAT CAA TTG GAA GCC CCT GGC ACA 
GTT 290 

Glu Cys Tyr Ser Ser Ser Thr Asp Gin Leu Glu Ala Pro Gly Thr 
Val 

75 80 85 

90 

TCA GAA GAA TCC CAG GTG CTT ACT GAT GTT GAG AGT CTC ATT ATG 
GAT 338 

Ser Glu Glu Ser Gin Val Leu Thr Asp Val Glu Ser Leu lie Met 
Asp 

95 100 105 

GAT AAG ATT GTT GAA GAT GAA GTA AAT AAA GAA TCT GTT CCA ATG 




CGG 386 
Asp Lys lie Val Glu 
Arg 

110 



GAG ACA GTT AGC ATC 

CCT 434 

Glu Thr Val Ser lie 

Pro 

125 



CCA CCC GGC AGA GGG 

ACA 482 

Pro Pro Gly Arg Gly 

Thr 

140 



GGC TTT CGT CAA CAC 

CTC 53 0 

Gly Phe Arg Gin His 

Leu 

155 

170 

CGA GAA GAA ATT GAC 

CGT 57 8 

Arg Glu Glu lie Asp 

Arg 

175 



GGC TAT GAA AAG TTT 

TAT 62 6 

Gly Tyr Glu Lys Phe 

Tyr 

190 



AGA GAG TGG GCA CCA 

TTC 674 

Arg Glu Trp Ala Pro 

Phe 

205 

AAT AAC TGG AAT CCT 

GGT 722 

Asn Asn Trp Asn Pro 

Gly 

220 
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Asp Glu Val Asn Lys 
115 

AGA AAA ATT GGA TCT 
Arg Lys lie Gly Ser 
130 

CAA AGA ATA TAT GAC 
Gin Arg lie Tyr Asp 
145 

CTA GAT TAC CGG TAT 
Leu Asp Tyr Arg Tyr 
160 

AAG TAT GAA GGT AGT 
Lys Tyr Glu Gly Ser 

180 

GGT TTC TCA CGC AGT 
Gly Phe Ser Arg Ser 
195 

GGA GCT ACG TGG GCT 
Gly Ala Thr Trp Ala 
210 

AAT GCA GAT GTC ATG 
Asn Ala Asp Val Met 
225 




Glu Ser Val Pro Met 
120 

AAA CCA AGG TCC ATT 
Lys Pro Arg Ser lie 
135 

ATA GAT CCA AGC TTG 
lie Asp Pro Ser Leu 
150 

TCA CAG TAC AAA AGA 
Ser Gin Tyr Lys Arg 
165 

CTG GAT GCA TTT TCT 
Leu Asp Ala Phe Ser 

185 

GAA ACA GGA ATA ACT 
Glu Thr Gly He Thr 
200 

GCA TTG ATT GGA GAT 
Ala Leu He Gly Asp 
215 

ACT CAG AAT GAG TGT 
Thr Gin Asn Glu Cys 
230 
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GTC TGG GAG ATC TTT 

ATT 770 

Val Trp Glu lie Phe 

He 

235 

250 

CCC CAT GGT TCT CGA 

AAC 818 

Pro His Gly Ser Arg 

Asn 

255 



AAA GAT TCT ATT CCT 

GGT 866 

Lys Asp Ser He Pro 

Gly 

270 



GAA CTC CCA TAT AAT 

AAG 914 

Glu Leu Pro Tyr Asn 

Lys 

285 



TAT GTG TTC AAA AAT 

ATT 962 

Tyr Val Phe Lys Asn 

He 

300 



TAT GAG TCG CAC GTT 

ACA 1010 

Tyr Glu Ser His Val 

Thr 

315 

330 

TAT GCC AAC TTT AGA 

GGC 1058 

Tyr Ala Asn Phe Arg 

Gly 

335 



TAC AAT GCT GTT CAG 



TTG CCG AAT AAT GCA 
Leu Pro Asn Asn Ala 
240 

GTA AAG ATA CGC ATG 
Val Lys He Arg Met 

260 

GCT TGG ATC AAG TTC 
Ala Trp He Lys Phe 
275 

GGC ATA TAC TAT GAT 
Gly He Tyr Tyr Asp 
290 

CCT CAG CCA AAG AGA 
Pro Gin Pro Lys Arg 
305 

GGA ATG AGT AGT ACG 
Gly Met Ser Ser Thr 
320 

GAT GAT GTG CTT CCT 
Asp Asp Val Leu Pro 

340 

CTC ATG GCT ATT CAA 



GAT GGT TCA CCA CCA 
Asp Gly Ser Pro Pro 
245 

GAT ACT CCA TCT GGC 
Asp Thr Pro Ser Gly 

265 

TCA GTT CAA GCA CCA 
Ser Val Gin Ala Pro 
280 

CCT CCC GAG GAG GAG 
Pro Pro Glu Glu Glu 
295 

CCA AAA TCA CTT CGG 
Pro Lys Ser Leu Arg 
310 

GAG CCA GTA ATT AAC 
Glu Pro Val He Asn 
325 

CGC ATC AAA AAG CTT 
Arg He Lys Lys Leu 

345 

GAG CAT TCA TAT TAT 
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GCT 1106 

Tyr Asn Ala Val Gin Leu Met Ala He Gin Glu His Ser Tyr Tyr 
Ala 

350 355 360 



AGT TTT GGG TAT CAC GTC ACA AAC TTT TAT GCA GCT AGC AGC CGA 
TTT 1154 

Ser Phe Gly Tyr His Val Thr Asn Phe Tyr Ala Ala Ser Ser Arg 
Phe 

365 370 375 



GGA ACT CCT GAT GAT TTA AAG TCT CTA ATA GAT AAA GCT CAC GAG 
TTA 12 02 

Gly Thr Pro Asp Asp Leu Lys Ser Leu He Asp Lys Ala His Glu 
Leu 

380 385 390 



GGT CTT CTT GTT CTC ATG GAT ATT GTT CAT AGC CAT GCA TCA ACT 
AAT 1250 

Gly Leu Leu Val Leu Met Asp He Val His Ser His Ala Ser Thr 
Asn 

395 400 405 

410 

ACG TTG GAT GGG CTG AAT ATG TTT GAT GGT ACG GAT GGT CAC TAC 
TTT 1298 

Thr Leu Asp Gly Leu Asn Met Phe Asp Gly Thr Asp Gly His Tyr 
Phe 

415 420 425 



CAC TCT GGA CCA CGG GGT CAT CAT TGG ATG TGG GAC TCT CGC CTT 
TTC 134 6 

His Ser Gly Pro Arg Gly His His Trp Met Trp Asp Ser Arg Leu 
Phe 

430 435 440 



AAC TAT GGG AGC TGG GAG GTT CTA AGG TTT CTT CTT TCA AAT GCA 
AGG 1394 

Asn Tyr Gly Ser Trp Glu Val Leu Arg Phe Leu Leu Ser Asn Ala 
Arg 

445 450 455 



TGG TGG TTG GAT GAG TAC AAG TTT GAT GGG TTC AGA TTT GAT GGG 
GTG 1442 

Trp Trp Leu Asp Glu Tyr Lys Phe Asp Gly Phe Arg Phe Asp Gly 
Val 



460 



ACT TCA ATG ATG TAC 
GGC 1490 
Thr Ser Met Met Tyr 
Gly 
475 
490 

AAC TAC AAT GAA TAC 
GTT 1538 
Asn Tyr Asn Glu Tyr 
Val 

495 



TAT TTG ATG CTG TTG 

GCT 1586 

Tyr Leu Met Leu Leu 

Ala 

510 



GTC ACC ATT GGT GAA 

CCG 1634 

Val Thr He Gly Glu 

Pro 

525 



GTT GAA GAT GGT GGT 

GTT 1682 

Val Glu Asp Gly Gly 

Val 

540 



GCT GAT AAA TGG GTT 

AAA 1730 

Ala Asp Lys Trp Val 

Lys 

555 

570 

ATG GGT GAC ATT GTA 

AAG 1778 

Met Gly Asp He Val 

Lys 

575 



465 

ACC CAT CAT GGA TTG 
Thr His His Gly Leu 
480 

TTT GGA TAT GCA ACT 
Phe Gly Tyr Ala Thr 

500 

AAT GAT ATG ATT CAT 
Asn Asp Met He His 
515 

GAT GTT AGT GGA ATG 
Asp Val Ser Gly Met 
530 

GTT GGC TTT GAT TAT 
Val Gly Phe Asp Tyr 
545 

GAG ATT ATT CAG AAG 
Glu He He Gin Lys 
560 

CAT ATG CTG ACC AAC 
His Met Leu Thr Asn 

580 



470 

CAG GTA GAT TTT ACC 
Gin Val Asp Phe Thr 
485 

GAT GTA GAT GCT GTG 
Asp Val Asp Ala Val 

505 

GGT CTC TTC CCA GAG 
Gly Leu Phe Pro Glu 
520 

CCA ACA GTT TGC ATT 
Pro Thr Val Cys He 
535 

CGT CTC CAC ATG GCT 
Arg Leu His Met Ala 
550 

AGA GAT GAA GAT TGG 
Arg Asp Glu Asp Trp 
565 

AGG CGG TGG TTG GAA 
Arg Arg Trp Leu Glu 

585 
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TGT GTT TCT TAT GCT GAA AGT CAT GAC CAG GCC CTT GTT GGT GAC 
AAA 1826 

Cys Val Ser Tyr Ala Glu Ser His Asp Gin Ala Leu Val Gly Asp 
Lys 

590 595 600 



ACT ATT GCA TTT TGG 

GCT 1874 

Thr lie Ala Phe Trp 

Ala 

605 



CTG ATG GAC AAG GAT 
Leu Met Asp Lys Asp 
610 



ATG TAT GAC TTC ATG 
Met Tyr Asp Phe Met 
615 



CTT GAC AGA CCA TCT ACT CCT 
CAC 1922 

Leu Asp Arg Pro Ser Thr Pro 
His 

620 625 



CTC ATA GAT CGT GGA GTA GCA TTG 
Leu lie Asp Arg Gly Val Ala Leu 

630 



AAA ATG ATC AGG CTT 

TTG 197 0 

Lys Met lie Arg Leu 

Leu 

635 

650 

AAT TTT ATG GGA AAT 

CCA 2018 

Asn Phe Met Gly Asn 

Pro 

655 



ATT ACC ATG GGA TTA 
lie Thr Met Gly Leu 
640 

GAA TTT GGA CAC CCC 
Glu Phe Gly His Pro 

660 



GGC GGA GAA GGA TAT 
Gly Gly Glu Gly Tyr 
645 

GAG TGG ATT GAT TTT 
Glu Trp lie Asp Phe 

665 



AGA GGT GAT CTA CAT CTT CCC AGT GGT AAA TTT GTT CCT GGG AAC 
AAT 2066 

Arg Gly Asp Leu His Leu Pro Ser Gly Lys Phe Val Pro Gly Asn 
Asn 

670 675 680 



TAC AGT TAT GAT AAA TGC CGG CGT 
AAG 2114 

Tyr Ser Tyr Asp Lys Cys Arg Arg 
Lys 

685 690 



AGG TTT GAT CTA GGC AAT TCA 
Arg Phe Asp Leu Gly Asn Ser 

695 



CAT CTG AGA TAT CAT GGA ATG CAA GAG TTT GAT CAA GCA ATT CAG 
CAT 2162 

His Leu Arg Tyr His Gly Met Gin Glu Phe Asp Gin Ala He Gin 




His 

700 



CTT GAA GAA GCC TAT 

TCA 2210 

Leu Glu Glu Ala Tyr 

Ser 

715 

730 

CGG AAG GAT GAA AGG 

CTC 2258 

Arg Lys Asp Glu Arg 

Leu 

735 



GTT TTT GTA TTC AAT 

CGA 2306 

Val Phe Val Phe Asn 

Arg 

750 



GTT GGC TGC TTA AAG 

GAT 2354 

Val Gly Cys Leu Lys 

Asp 

765 



GAT CCT TTG TTT GGA 

CAC 2402 

Asp Pro Leu Phe Gly 

His 

780 



TTC AGC TTT GAA GGG 

GTG 2450 

Phe Ser Phe Glu Gly 

Val 

795 

810 

TAC ACA CCA TGT AGA 

GAA 2498 

Tyr Thr Pro Cys Arg 

Glu 

815 
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705 

GGT TTC ATG ACT TCT 
Gly Phe Met Thr Ser 
720 

GAT CGG ATC ATT GTC 
Asp Arg lie lie Val 

740 

TTT CAT TGG ACT AGC 
Phe His Trp Thr Ser 
755 

CCA GGA AAG TAC AAG 
Pro Gly Lys Tyr Lys 
770 

GGC TTT GGC AGG CTT 
Gly Phe Gly Arg Leu 
785 

TGG TAC GAT AAC CGG 
Trp Tyr Asp Asn Arg 
800 

ACA GCA GTG GTC TAT 
Thr Ala Val Val Tyr 

820 




710 

GAG CAC CAA TAC ATA 
Glu His Gin Tyr lie 
725 

TTC GAG AGG GGA AAC 
Phe Glu Arg Gly Asn 

745 

AGC TAT TCG GAT TAC 
Ser Tyr Ser Asp Tyr 
760 

ATA GTC TTG GAT TCA 
lie Val Leu Asp Ser 
775 

AGT CAT GAT GCA GAG 
Ser His Asp Ala Glu 
790 

CCT CGA TCC TTC ATG 
Pro Arg Ser Phe Met 
805 

GCT TTA GTG GAG GAT 
Ala Leu Val Glu Asp 

825 
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GTG GAG AAT GAA TTG GAA CCT GTC GCC GGT TAA GATATATCTT 
AACAACAGGT 2551 

Val Glu Asn Glu Leu Glu Pro Val Ala Gly * 
830 835 



TCTGAAGCAG GAATGCCATT ATTGATCTTC CTATGTT 
2588 



(2) INFORMATION FOR SEQ ID NO : 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 837 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Met Gly His Tyr Thr He Ser Gly He Arg Phe Pro Cys Ala Pro 
Leu 

1 5 10 15 



Cys Lys Ser Gin Ser Thr Gly Phe His Gly Tyr Arg Arg Thr Ser 
Ser 

20 25 30 



Cys Leu Ser Phe Asn Phe Lys Glu Ala Phe Ser Arg Arg Val Phe 
Ser 

35 40 45 



Gly Lys Ser Ser His Glu Ser Asp Ser Ser Asn Val Met Val Thr 
Ala 

50 55 60 



Ser Lys Arg Val Leu Pro Asp Gly Arg He Glu Cys Tyr Ser Ser 
Ser 

65 70 75 

80 

Thr Asp Gin Leu Glu Ala Pro Gly Thr Val Ser Glu Glu Ser Gin 
Val 

85 90 95 
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Leu Thr Asp Val Glu Ser Leu He Met Asp Asp Lys He Val Glu 
Asp 

100 105 HO 



Glu Val Asn Lys Glu Ser Val Pro Met Arg Glu Thr Val Ser He 
Arg 

H5 120 125 



Lys He Gly Ser Lys Pro Arg Ser He Pro Pro Pro Gly Arg Gly 
Gin 

130 135 140 



Arg He Tyr Asp He Asp Pro Ser Leu Thr Gly Phe Arg Gin His 
Leu 

145 150 155 

160 

Asp Tyr Arg Tyr Ser Gin Tyr Lys Arg Leu Arg Glu Glu He Asp 
Lys 

165 170 175 



Tyr Glu Gly Ser Leu Asp Ala Phe Ser Arg Gly Tyr Glu Lys Phe 
Gly 

180 185 190 



Phe Ser Arg Ser Glu Thr Gly He Thr Tyr Arg Glu Trp Ala Pro 
Gly 

195 200 205 



Ala Thr Trp Ala Ala Leu He Gly Asp Phe Asn Asn Trp Asn Pro 
Asn 

210 215 220 



Ala Asp Val Met Thr Gin Asn Glu Cys Gly Val Trp Glu He Phe 
Leu 

225 230 235 

240 

Pro Asn Asn Ala Asp Gly Ser Pro Pro He Pro His Gly Ser Arg 
Val 

245 250 255 



Lys He Arg Met Asp Thr Pro Ser Gly Asn Lys Asp Ser He Pro 
Ala 
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260 



265 



270 



Trp He Lys Phe Ser Val Gin Ala Pro Gly Glu Leu Pro Tyr Asn 
Gly 

275 280 285 

He Tyr Tyr Asp Pro Pro Glu Glu Glu Lys Tyr Val Phe Lys Asn 
Pro 

290 295 300 



Gin Pro Lys Arg Pro Lys Ser Leu Arg He Tyr Glu Ser His Val 
Gly 

305 310 315 

320 

Met Ser Ser Thr Glu Pro Val He Asn Thr Tyr Ala Asn Phe Arg 
Asp 

325 330 335 



Asp Val Leu Pro Arg He Lys Lys Leu Gly Tyr Asn Ala Val Gin 
Leu 

340 345 350 



Met Ala He Gin Glu His Ser Tyr Tyr Ala Ser Phe Gly Tyr His 
Val 



355 



360 



365 



Thr Asn Phe Tyr Ala Ala Ser Ser Arg Phe Gly Thr Pro Asp Asp 
Leu 



370 



375 



380 



Lys Ser Leu He Asp Lys Ala His Glu Leu Gly Leu Leu Val Leu 
Met 



385 
400 



390 



395 



Asp He Val His Ser His Ala Ser Thr Asn Thr Leu Asp Gly Leu 
Asn 



405 



410 



415 



Met Phe Asp Gly Thr Asp Gly His Tyr Phe His Ser Gly Pro Arg 
Gly 

420 425 430 
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His His Trp Met Trp Asp Ser Arg Leu Phe Asn Tyr Gly Ser Trp 
Glu 

435 440 445 



Val Leu Arg Phe Leu Leu Ser Asn Ala Arg Trp Trp Leu Asp Glu 
Tyr 

450 455 460 



Lys Phe Asp Gly Phe Arg Phe Asp Gly Val Thr Ser Met Met Tyr 
Thr 

465 470 475 

480 

His His Gly Leu Gin Val Asp Phe Thr Gly Asn Tyr Asn Glu Tyr 
Phe 

485 490 495 



Gly Tyr Ala Thr Asp Val Asp Ala Val Val Tyr Leu Met Leu Leu 
Asn 

500 505 510 



Asp Met lie His Gly Leu Phe Pro Glu Ala Val Thr lie Gly Glu 
Asp 

515 520 525 



Val Ser Gly Met Pro Thr Val Cys lie Pro Val Glu Asp Gly Gly 
Val 

530 535 540 



Gly Phe Asp Tyr Arg Leu His Met Ala Val Ala Asp Lys Trp Val 
Glu 

545 550 555 

560 

lie lie Gin Lys Arg Asp Glu Asp Trp Lys Met Gly Asp lie Val 
His 

565 570 575 



Met Leu Thr Asn Arg Arg Trp Leu Glu Lys Cys Val Ser Tyr Ala 
Glu 

580 585 590 



Ser His Asp Gin Ala Leu Val Gly Asp Lys Thr lie Ala Phe Trp 
Leu 
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595 600 605 



Met Asp Lys Asp Met Tyr Asp Phe Met Ala Leu Asp Arg Pro Ser 
Thr 

610 615 620 



Pro Leu lie Asp Arg Gly Val Ala Leu His Lys Met lie Arg Leu 
He 

625 630 635 

640 

Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu Asn Phe Met Gly Asn 
Glu 

645 650 655 



Phe Gly His Pro Glu Trp He Asp Phe Pro Arg Gly Asp Leu His 
Leu 

660 665 670 



Pro Ser Gly Lys Phe Val Pro Gly Asn Asn Tyr Ser Tyr Asp Lys 
Cys 

675 680 685 

Arg Arg Arg Phe Asp Leu Gly Asn Ser Lys His Leu Arg Tyr His 
Gly 

690 695 700 



Met Gin Glu Phe Asp Gin Ala He Gin His Leu Glu Glu Ala Tyr 
Gly 

705 710 715 

720 

Phe Met Thr Ser Glu His Gin Tyr He Ser Arg Lys Asp Glu Arg 
Asp 

725 730 735 



Arg He He Val Phe Glu Arg Gly Asn Leu Val Phe Val Phe Asn 
Phe 

740 745 750 



His Trp Thr Ser Ser Tyr Ser Asp Tyr Arg Val Gly Cys Leu Lys 
Pro 

755 760 765 
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Gly Lys Tyr Lys lie Val Leu Asp Ser Asp Asp Pro Leu Phe Gly 
Gly 

770 775 780 



Phe Gly Arg Leu Ser His Asp Ala Glu His Phe Ser Phe Glu Gly 
Trp 

785 790 795 

800 

Tyr Asp Asn Arg Pro Arg Ser Phe Met Val Tyr Thr Pro Cys Arg 
Thr 

805 810 815 



Ala Val Val Tyr Ala Leu Val Glu Asp Glu Val Glu Asn Glu Leu 
Glu 

820 825 830 



Pro Val Ala Gly * 
835 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 805 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix), FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 131. .2677 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 30: 

AGTGAATTCG AGCTCGGTAC CCGGGGATCC GATTCGCATT TCTCGCTATT 
GCTTTCCGTT 6 0 

TATTTCCATA TATAAAATAT CAAATCTAAT CACTTGCGCC ATTTCTATCT 
CTCTCCAAAC 120 

TCTCACCGAA ATG GTA TAC TAC ACT GTA TCA GGC ATA CGT TTT CCT 
TGT 169 

Met Val Tyr Tyr Thr Val Ser Gly lie Arg Phe Pro 

Cys 

840 845 

850 



GCA CCT TCA CTC TAC AAA TCT CAG CTC ACC AGC TTC CAT GGC GGT 




CGA 217 
Ala Pro Ser Leu Tyr 
Arg 

855 



AGG ACC TCT TCT GGC 
CCT 265 
Arg Thr Ser Ser Gly 
Pro 

870 



CGG AAG ATC TTT GCT 
AAT 313 
Arg Lys lie Phe Ala 
Asn 

885 



TTA ACT GTC TCT GCA 

ATT 3 61 

Leu Thr Val Ser Ala 

He 

900 



GAT GGC TCT TCT TCT 

GTT 409 

Asp Gly Ser Ser Ser 

Val 

915 

930 

TTG GAG GAA TCC CAG 

GAA 457 

Leu Glu Glu Ser Gin 

Glu 

935 



GAT GAT AAG AAT GTT 

CCA 505 

Asp Asp Lys Asn Val 

Pro 

950 



TTG CAT GAG ACA ATT 

TCC 553 

Leu His Glu Thr He 

Ser 
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Lys Ser Gin Leu Thr 

860 

CTT TCC TTC CTC TTG 
Leu Ser Phe Leu Leu 
875 

GGA AAG TCC TCT TAT 
Gly Lys Ser Ser Tyr 
890 

TCT GAG AAG GTC CTT 
Ser Glu Lys Val Leu 
905 

TCA ACA TAT CAA TTA 
Ser Thr Tyr Gin Leu 
920 

GTT CTT GGT GAT GCA 
Val Leu Gly Asp Ala 

940 

GAG GAG GAT GAA GTA 
Glu Glu Asp Glu Val 
955 

AGC ATT GGA AAA AGT 
Ser He Gly Lys Ser 




Ser Phe His Gly Gly 

865 

AAG AAG GAG CTG TTT 
Lys Lys Glu Leu Phe 
880 

GAA TCT GAC TCC TCA 
Glu Ser Asp Ser Ser 
895 

GTT CCT GAT GAT CAG 
Val Pro Asp Asp Gin 
910 

GAA ACC ACT GGC ACA 
Glu Thr Thr Gly Thr 
925 

GAG AGT CTT GTG ATG 
Glu Ser Leu Val Met 

945 

AAA AAA GAG TCG GTT 
Lys Lys Glu Ser Val 
960 

GAA TCT AAA CCA AGG 
Glu Ser Lys Pro Arg 
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965 970 975 



ATT CCT CCA CCT GGC AGT GGG CAG AGA ATA TAT GAC ATA GAT CCA 
AGC 601 

He Pro Pro Pro Gly Ser Gly Gin Arg He Tyr Asp He Asp Pro 
Ser 

980 985 990 



TTG GCA GGT TTC CGT CAG CAT CTT GAC TAC CGA TAT TCA CAG TAC 
AAA 649 

Leu Ala Gly Phe Arg Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr 
Lys 

995 1000 1005 

1010 

AGG CTG CGT GAG GAA ATT GAC AAG TAT GAA GGT GGT TTG GAT GCA 
TTC 697 

Arg Leu Arg Glu Glu He Asp Lys Tyr Glu Gly Gly Leu Asp Ala 
Phe 

1015 1020 1025 



TCT CGT GGA TTT GAA AAG TTT GGT TTC TTA CGC AGT GAA ACA GGA 
ATA 745 

Ser Arg Gly Phe Glu Lys Phe Gly Phe Leu Arg Ser Glu Thr Gly 

He 

1030 1035 1040 



ACT TAT AGG GAA TGG GCA CCT GGA GCT ACG TGG GCT GCA CTT ATT 
GGA 793 

Thr Tyr Arg Glu Trp Ala Pro Gly Ala Thr Trp Ala Ala Leu He 
Gly 

1045 1050 1055 



GAT TTC AAC AAT TGG AAT CCT AAT GCA GAT GTC ATG ACT CGG AAT 
GAG 841 

Asp Phe Asn Asn Trp Asn Pro Asn Ala Asp Val Met Thr Arg Asn 
Glu 

1060 1065 1070 



TTT GGT GTC TGG GAG ATT TTT TTG CCA AAT AAC GCA GAT GGT TCA 
CCA 889 

Phe Gly Val Trp Glu He Phe Leu Pro Asn Asn Ala Asp Gly Ser 
Pro 

1075 1080 1085 

1090 
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CCA ATT CCT CAT GGT TCT CGA GTA AAG ATA CGC ATG GAT ACT CCA 
TCT 937 

Pro lie Pro His Gly Ser Arg Val Lys lie Arg Met Asp Thr Pro 
Ser 

1095 1100 1105 



GGC ATC AAA GAT TCA ATT CCT GCT TGG ATC AAG TTC TCA GTT CAG 
GCA 985 

Gly lie Lys Asp Ser lie Pro Ala Trp lie Lys Phe Ser Val Gin 
Ala 

1110 1115 1120 



CCT GGT GAA ATC CCA TAC AAT GCC ATA TAC TAT GAT CCA CCA AAG 
GAG 1033 

Pro Gly Glu lie Pro Tyr Asn Ala lie Tyr Tyr Asp Pro Pro Lys 
Glu 

1125 1130 1135 



GAG AAG TAT GTG TTC AAA CAT CCT CAG CCA AAG AGA CCA AAA TCA 
CTT 1081 

Glu Lys Tyr Val Phe Lys His Pro Gin Pro Lys Arg Pro Lys Ser 
Leu 

1140 1145 1150 



AGG ATT TAT GAA TCT CAT GTT GGG ATG AGT AGT ATG GAG CCA ATA 
ATT 1129 

Arg lie Tyr Glu Ser His Val Gly Met Ser Ser Met Glu Pro lie 
He 

1155 1160 1165 

1170 

AAC ACA TAT GCC AAC TTT AGA GAT GAT ATG CTT CCT CGC ATC AAA 
AAG 1177 

Asn Thr Tyr Ala Asn Phe Arg Asp Asp Met Leu Pro Arg He Lys 
Lys 

1175 1180 1185 



CTT GGC TAC AAT GCT GTT CAG ATC ATG GCT ATT CAA GAG CAT TCC 
TAT 1225 

Leu Gly Tyr Asn Ala Val Gin He Met Ala He Gin Glu His Ser 
Tyr 

1190 1195 1200 



TAT GCT AGT TTT GGG TAC CAT GTC ACA AAC TTT TTT GCA CCT AGC 
AGC 1273 

Tyr Ala Ser Phe Gly Tyr His Val Thr Asn Phe Phe Ala Pro Ser 
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Ser 

1205 1210 1215 



CGA TTT GGA ACT CCT GAT GAT TTG AAG TCT TTA ATA GAT AAA GCT 
CAT 1321 

Arg Phe Gly Thr Pro Asp Asp Leu Lys Ser Leu lie Asp Lys Ala 
His 

1220 1225 1230 

GAG TTA GGG CTG CTT GTT CTC ATG GAT ATT GTT CAT AGC CAT GCG 
TCA 1369 

Glu Leu Gly Leu Leu Val Leu Met Asp lie Val His Ser His Ala 
Ser 

1235 1240 1245 

1250 

AAT AAT ACG TTG GAT GGG CTG AAC ATG TTT GAT GGT ACG GAT AGT 
CAC 1417 

Asn Asn Thr Leu Asp Gly Leu Asn Met Phe Asp Gly Thr Asp Ser 
His 

1255 1260 1265 



TAC TTC CAC TCC GGA TCA CGG GGT CAT CAT TGG TTG TGG GAC TCT 
CGC 1465 

Tyr Phe His Ser Gly Ser Arg Gly His His Trp Leu Trp Asp "ser 
Arg 

1270 1275 1280 



CTT TTC AAC TAT GGA AGC TGG GAG GTG CTA AGA TTT CTT CTT TCA 
AAT 1513 

Leu Phe Asn Tyr Gly Ser Trp Glu Val Leu Arg Phe Leu Leu Ser 
Asn 

1285 1290 1295 



GCA AGA TGG TGG TTG GAA GAG TAC AGG TTT GAT GGT TTT AGA TTT 
GAT 1561 

Ala Arg Trp Trp Leu Glu Glu Tyr Arg Phe Asp Gly Phe Arg Phe 
Asp 

1300 1305 1310 



GGG GTG ACT TCC ATG ATG TAC ACT CCC CAT GGG TTG CAG GTA GCT 
TTT 1609 

Gly Val Thr Ser Met Met Tyr Thr Pro His Gly Leu Gin Val Ala 
Phe 

1315 1320 1325 

1330 
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ACT GGC AAC TAC AAT GAG TAC TTT GGA TAT GCA ACT GAT GTA GAT 
GCT 1657 

Thr Gly Asn Tyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp 
Ala 

1335 1340 1345 



GTG ATT TAT TTG ATG CTT GTG AAT GAT ATG ATT CAC GGT CTT TTC 
CCT 1705 

Val lie Tyr Leu Met Leu Val Asn Asp Met lie His Gly Leu Phe 
Pro 

1350 1355 1360 



GAG GCT GTT ACC ATT GGT GAA GAT GTT AGC GGA AAG CCA ACA TTT 
TGC 1753 

Glu Ala Val Thr lie Gly Glu Asp Val Ser Gly Lys Pro Thr Phe 
Cys 

1365 1370 1375 



ATT CCA GTG GAA GAT GGT GGT GTT GGA TTT GAT TAC CGT CTC CAC 
ATG 1801 

He Pro Val Glu Asp Gly Gly Val Gly Phe Asp Tyr Arg Leu His 
Met 

1380 1385 1390 



GCC ATT GCC GAT AAA TGG ATT GAG ATT CTT AAG AAG AGA GAT GAG 
GAC 1849 

Ala He Ala Asp Lys Trp He Glu He Leu Lys Lys Arg Asp Glu 
Asp 

1395 1400 1405 

1410 

TGG AAA ATG GGT GAC ATT GTG CAT ACA CTC ACC AAC AGA AGG TGG 
TTG 1897 

Trp Lys Met Gly Asp He Val His Thr Leu Thr Asn Arg Arg Trp 
Leu 

1415 1420 1425 



GAA AAA TGT GTT GCT TAT GCT GAA AGT CAT GAC CAA GCT CTT GTT 
GGT 1945 

Glu Lys Cys Val Ala Tyr Ala Glu Ser His Asp Gin Ala Leu Val 
Gly 

1430 1435 1440 



GAC AAA ACT ATT GCA TTT TGG CTG ATG GAC AAG GAC ATG TAC GAC 
TTC 1993 

Asp Lys Thr He Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp 
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Phe 

1445 1450 1455 



ATG GCT CGT GAC AGA CCA TCT ACT CCT CTT ATA GAT CGT GGA ATA 
GCA 2041 

Met Ala Arg Asp Arg Pro Ser Thr Pro Leu lie Asp Arg Gly lie 
Ala 

1460 1465 1470 



TTG CAC AAA ATG ATC AGG CTT ATT ACC ATG GGC TTA GGC GGA GAA 
GGA 2 089 

Leu His Lys Met lie Arg Leu lie Thr Met Gly Leu Gly Gly Glu 
Gly 

1475 1480 1485 

1490 

TAT TTG AAT TTT ATG GGA AAT GAA TTT GGA CAT CCT GAG TGG ATT 
GAT 2137 

Tyr Leu Asn Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp lie 
Asp 

1495 1500 1505 



TTT CCA AGA GGG GAT CGA CAT CTG CCC AAT GGT AAA GTA ATT CCA 
GGG 2185 

Phe Pro Arg Gly Asp Arg His Leu Pro Asn Gly Lys Val lie Pro 
Gly 

1510 1515 1520 



AAC AAC CAC AGT TAT GAT AAA TGC CGT CGT AGA TTT GAT CTA GGT 
GAT 2233 

Asn Asn His Ser Tyr Asp Lys Cys Arg Arg Arg Phe Asp Leu Gly 
Asp 

1525 1530 1535 



GCA GAC TAT CTA AGA TAT CAT GGA 
ATG 2281 

Ala Asp Tyr Leu Arg Tyr His Gly 
Met 

1540 1545 



ATG CAA GAG TTT GAT CAG GCA 
Met Gin Glu Phe Asp Gin Ala 
1550 



CAA CAT CTT GAA GAA GCC TAT GGT TTC ATG ACT TCT GAG CAC CAG 
TAT 2329 

Gin His Leu Glu Glu Ala Tyr Gly Phe Met Thr Ser Glu His Gin 
Tyr 

1555 1560 1565 

1570 




ATA TCA CGG AAG GAT GAA GGA 
GGA 2377 

lie Ser Arg Lys Asp Glu Gly 
Gly 

1575 



AAC CTT GTT TTT GTA TTC AAC 
GAT 2425 

Asn Leu Val Phe Val Phe Asn 
Asp 

1590 



TAC CGA GTT GGC TGC TTC AAG 
GAC 2473 

Tyr Arg Val Gly Cys Phe Lys 
Asp 

1605 
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GAT CGG ATC ATT GTC TTT GAG AGG 
Asp Arg lie lie Val Phe Glu Arg 
1580 1585 

TTT CAT TGG ACT AAC AGC TAT TCA 
Phe His Trp Thr Asn Ser Tyr Ser 
1595 1600 

TCA GGA AAG TAC AAG ATT GTT TTG 
Ser Gly Lys Tyr Lys lie Val Leu 
1610 1615 



TCG GAT GAT GGC TTG TTT GGA GGC TTC AAC AGG CTT AGT CAT GAT 
GCC 2 521 

Ser Asp Asp Gly Leu Phe Gly Gly Phe Asn Arg Leu Ser His Asp 
Ala 

1620 1625 1630 



GAG CAC TTC ACC TTT GAC GGG TGG TAT GAT AAC CGG CCT CGG TCC 
TTC 2569 

Glu His Phe Thr Phe Asp Gly Trp Tyr Asp Asn Arg Pro Arg Ser 
Phe 

1635 1640 1645 

1650 

ATG GTA TAT GCA CCA TCT AGG ACA GCA GTG GTC TAT GCT TTA GTA 
GAA 2 617 

Met Val Tyr Ala Pro Ser Arg Thr Ala Val Val Tyr Ala Leu Val 
Glu 

1655 1660 1665 



GAT GAA GAG AAT GAA GCA GAG AAT 
CCA 2 665 

Asp Glu Glu Asn Glu Ala Glu Asn 
Pro 

1670 



GAA GTA GAA AGT GAA GTG AAA 
Glu Val Glu Ser Glu Val Lys 
1675 1680 



GCC TCC GGC TGA GATAGATATT TAGTAAGAGG ATCCCCTAAA GCAGGAATGG 
2717 
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Ala Ser Gly * 
1685 

TTAACCTGTG CATCTGCATT GAACGACGTA TATTGAGACT GGAAATCCAT 
ATGACTAGTA 2777 

GATCCTCTAG AGTCGACCTG CAGGCATG 
2805 



(2) INFORMATION FOR SEQ ID NO : 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 849 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 



Met Val Tyr Tyr Thr Val Ser Gly lie Arg Phe Pro Cys Ala Pro 
Ser 

15 10 15 



Leu Tyr Lys Ser Gin Leu Thr Ser Phe His Gly Gly Arg Arg Thr 
Ser 

20 25 30 



Ser Gly Leu Ser Phe Leu Leu Lys Lys Glu Leu Phe Pro Arg Lys 
He 

35 40 45 



Phe Ala Gly Lys Ser Ser Tyr Glu Ser Asp Ser Ser Asn Leu Thr 
Val 

50 55 60 



Ser Ala Ser Glu Lys Val Leu Val Pro Asp Asp Gin He Asp Gly 
Ser 

65 70 75 

80 

Ser Ser Ser Thr Tyr Gin Leu Glu Thr Thr Gly Thr Val Leu Glu 
Glu 

85 90 95 



Ser Gin Val Leu Gly Asp Ala Glu Ser Leu Val Met Glu Asp Asp 
Lys 
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100 105 HQ 



Asn Val Glu Glu Asp Glu Val Lys Lys Glu Ser Val Pro Leu His 
Glu 

115 120 125 



Thr He Ser He Gly Lys Ser Glu Ser Lys Pro Arg Ser He Pro 
Pro 

130 135 140 



Pro Gly Ser Gly Gin Arg He Tyr Asp He Asp Pro Ser Leu Ala 
Gly 

145 150 155 

160 

Phe Arg Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr Lys Arg Leu 
Arg 

165 170 175 



Glu Glu He Asp Lys Tyr Glu Gly Gly Leu Asp Ala Phe Ser Arg 
Gly y 

180 185 190 



Phe Glu Lys Phe Gly Phe Leu Arg Ser Glu Thr Gly He Thr Tyr 

Arg 

195 200 205 



Glu Trp Ala Pro Gly Ala Thr Trp Ala Ala Leu He Gly Asp Phe 
Asn 

210 215 220 



Asn Trp Asn Pro Asn Ala Asp Val Met Thr Arg Asn Glu Phe Gly 
Val 

225 230 235 

240 



Trp Glu He Phe Leu Pro Asn Asn Ala Asp Gly Ser Pro Pro He 
Pro 

245 250 255 



His Gly Ser Arg Val Lys He Arg Met Asp Thr Pro Ser Gly He 
Lys 

260 265 270 
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Asp Ser He Pro Ala Trp He Lys Phe Ser Val Gin Ala Pro Gly 
Glu 



275 



280 



285 



He Pro Tyr Asn Ala He Tyr Tyr Asp Pro Pro Lys Glu Glu Lys 
Tyr 

290 295 300 



Val Phe Lys His Pro Gin Pro Lys Arg Pro Lys Ser Leu Arg He 
Tyr 

305 310 315 

320 

Glu Ser His Val Gly Met Ser Ser Met Glu Pro He He Asn Thr 
Tyr 

325 330 335 



Ala Asn Phe Arg Asp Asp Met Leu Pro Arg He Lys Lys Leu Gly 
Tyr 

340 345 350 



Asn Ala Val Gin He Met Ala He Gin Glu His Ser Tyr Tyr Ala 
Ser 

355 360 365 



Phe Gly Tyr His Val Thr Asn Phe Phe Ala Pro Ser Ser Arg Phe 
Gly 

370 375 380 



Thr Pro Asp Asp Leu Lys Ser Leu He Asp Lys Ala His Glu Leu 
Gly 

385 390 395 

400 



Leu Leu Val Leu Met Asp He Val His Ser His Ala Ser Asn Asn 
Thr 

405 410 415 



Leu Asp Gly Leu Asn Met Phe Asp Gly Thr Asp Ser His Tyr Phe 
His 

420 425 430 
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Ser Gly Ser Arg Gly His His Trp Leu Trp Asp Ser Arg Leu Phe 
Asn 

435 440 445 



Tyr Gly Ser Trp Glu Val Leu Arg Phe Leu Leu Ser Asn Ala Arg 
Trp 

450 455 460 



Trp Leu Glu Glu Tyr Arg Phe Asp Gly Phe Arg Phe Asp Gly Val 
Thr 

465 470 475 

480 

Ser Met Met Tyr Thr Pro His Gly Leu Gin Val Ala Phe Thr Gly 
Asn 

485 490 495 



Tyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp Ala Val He 
Tyr 

500 505 510 



Leu Met Leu Val Asn Asp Met He His Gly Leu Phe Pro Glu Ala 
Val 

515 520 525 



Thr He Gly Glu Asp Val Ser Gly Lys Pro Thr Phe Cys He Pro 
Val 

530 535 540 



Glu Asp Gly Gly Val Gly Phe Asp Tyr Arg Leu His Met Ala He 
Ala 

545 550 555 

560 

Asp Lys Trp He Glu He Leu Lys Lys Arg Asp Glu Asp Trp Lys 
Met 

565 570 575 



Gly Asp He Val His Thr Leu Thr Asn Arg Arg Trp Leu Glu Lys 
Cys 

580 585 590 



Val Ala Tyr Ala Glu Ser His Asp Gin Ala Leu Val Gly Asp Lys 
Thr 



595 



He Ala Phe 
Arg 

610 



Asp Arg Pro 

Lys 

625 

640 

Met He Arg 
Asn 



Phe Met Gly 
Arg 



Gly Asp Arg 
His 

675 



Ser Tyr Asp 
Tyr 

690 



Leu Arg Tyr 

Leu 

705 

720 

Glu Glu Ala 
Arg 



Lys Asp Glu 
Val 



Phe Val Phe 
Val 

755 




Trp Leu Met 

Ser Thr Pro 
630 

Leu He Thr 
645 

Asn Glu Phe 
660 

His Leu Pro 

Lys Cys Arg 

His Gly Met 
710 

Tyr Gly Phe 
725 

Gly Asp Arg 
740 

Asn Phe His 
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600 

Asp Lys Asp 
615 

Leu He Asp 

Met Gly Leu 

Gly His Pro 
665 

Asn Gly Lys 
680 

Arg Arg Phe 
695 

Gin Glu Phe 

Met Thr Ser 

He He Val 
745 

Trp Thr Asn 
760 




Met Tyr Asp 
620 

Arg Gly He 
635 

Gly Gly Glu 
650 

Glu Trp He 

Val He Pro 

Asp Leu Gly 
700 

Asp Gin Ala 
715 

Glu His Gin 
730 

Phe Glu Arg 
Ser Tyr Ser 



605 

Phe Met Ala 

Ala Leu His 

Gly Tyr Leu 
655 

Asp Phe Pro 
670 

Gly Asn Asn 
685 

Asp Ala Asp 

Met Gin His 

Tyr He Ser 
735 

Gly Asn Leu 
750 

Asp Tyr Arg 
765 
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Gly Cys Phe Lys Ser Gly Lys Tyr Lys lie Val Leu Asp Ser Asp 
Asp 

770 775 780 



Gly Leu Phe Gly Gly Phe Asn Arg Leu Ser His Asp Ala Glu His 
Phe 

785 790 795 

800 

Thr Phe Asp Gly Trp Tyr Asp Asn Arg Pro Arg Ser Phe Met Val 
Tyr 

805 810 815 



Ala Pro Ser Arg Thr Ala Val Val Tyr Ala Leu Val Glu Asp Glu 
Glu 

820 825 830 



Asn Glu Ala Glu Asn Glu Val Glu Ser Glu Val Lys Pro Ala Ser 
Gly 

835 840 845 
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ABSTRACT 

Title: Improvements in or Relating to Starch Content of Plants 

Disclosed is a nucleic acid sequence encoding a polypeptide having starch branching 
enzyme (SBE) activity, the encoded polypeptide comprising an effective portion of the 
amino acid sequence shown in Figure 4 (SEQ. ID. NO. 29) or Figure 13 (SEQ. ID. NO. 
31). 




Field of the Invention 



This invention relates to novel nucleic acid sequences, vectors and host cells comprising 
the nucleic acid sequence(s), to polypeptides encoded thereby, and to a method of altering 
a host cell by introducing the nucleic acid sequence(s) of the invention. 



Background to the Invention 



Starch consists of two main polysaccharides, amylose and amylopectin. Amylose is a 
linear polymer containing a-1,4 linked glucose units, while amylopectin is a highly 
branched polymer consisting of a a- 1,4 linked glucan backbone with a- 1,6 linked glucan 
branches. In most plant storage reserves amylopectin consitutes about 75% of the starch 
content. Amylopectin is synthesized by the concerted action of soluble starch synthase and 
starch branching enzyme [a-1,4 glucan: a-1,4 glucan 6-glycosyltransferase, EC 2.4.1.18]. 
Starch branching enzyme (SBE) hydropses a-1,4 linkages and rejoins the cleaved glucan, 
via an a-1,6 linkage, to an acceptor chain to produce a branched structure. The physical 
properties of starch are strongly affected by the relative abundance of amylose and 
amylopectin, and SBE is therefore a crucial enzyme in determining both the quantity and 
quality of starches produced in plant systems. 

Starches are commercially available from several plant sources including maize, potato and 
cassava. Each of these starches has unique physical characteristics and properties and a 
variety of possible industrial uses. In maize there are a number of naturally occurring 
mutants which have altered starch composition such as high amylopectin types ("waxy" 
starches) or high amylose starches but in potato and cassava no such mutants exist on a 
commercial basis as yet. 
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Genetic modification offers the possibility of obtaining new starches which may have 
novel and potentially useful characteristics. Most of the work to date has involved potato 
plants because they are amenable to genetic manipulation i.e. they can be transformed 
using Agrobacterium and regenerated easily from tissue culture. In addition many of the 
genes involved in starch biosynthesis have been cloned from potato and thus are available 
as targets for genetic manipulation, for example, by antisense inhibition of expression or 
sense suppression. 

Cassava (Manihot esculenta L. Crantz) is an important crop in the tropics, where its starch- 
filled roots are used both as a food source and increasingly as a source of starch. Cassava is 
a high yielding perennial crop that can grow on poor soils and is also tolerant of drought. 
Cassava starch being a root-derived starch has properties similar but not identical to potato 
starch and is composed of 20-25% amylose and 75-80% amylopectin (Rickard et al, 1991. 
Trop. Sci. 31, 189-207). Some of the genes involved in starch biosynthesis have been 
cloned from cassava, including starch branching enzyme I (SBE I) (Salehuzzaman et al, 
1994 Plant Science 98, 53-62), and granule bound starch synthase I (GBSS I) 
(Salehuzzaman et al, 1993 Plant Molecular Biology 23, 947-962) and some work has been 
done on their expression patterns although only in in vitro grown plants (Salehuzzaman et 
al, 1994 Plant Science 98, 53-62). 

In most plants studied to date e.g. maize (Boyer & Preiss, 1978 Biochem. Biophys. Res. 
Comm. 80, 169-175), rice (Smyth, 1988 Plant Sci. 57, 1-8) and pea (Smith, Planta 175, 
270-279), two forms of SBE have been identified, each encoded by a separate gene. A 
recent review by Burton et al, (1995 The Plant Journal 7, 3-15) has demonstrated that the 
two forms of SBE constitute distinct classes of the enzyme such that, in general, enzymes 
of the same class from different plants may exhibit greater similarity than enzymes of 
different classes from the same plant. In their review, Burton et al. termed the two 
respective enzyme families class "A" and class "B", and the reader is referred thereto (and 
to the references cited therein) for a detailed discussion of the distinctions between the two 
classes. One general distinction of note would appear to be the presence, in class A SBE 
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molecules, of a flexible N-terminal domain, which is not found in class B molecules. The 
distinctions noted by Burton et al. are relied on herein to define class A and class B SBE 
molecules, which terms are to be interpreted accordingly. 

Many organisations have interests in obtaining modified Cassava starches by means of 
genetic modification. This is impossible to achieve however, unless the plant is amenable 
to transformation and regeneration, and the starch biosynthesis genes which are to be 
targeted for modification must be cloned. The production of transgenic cassava plants has 
only recently been demonstrated (Taylor et al, 1996 Nature Biotechnology 14, 726-730; 
Schopke et al, 1996 Nature Biotechnology 14, 731-735; and Li et al, 1996 Nature 
Biotechnology 14, 736-740). The present invention concerns the identification, cloning 
and sequencing of a starch biosynthetic gene from Cassava, suitable as a target for genetic 
manipulation. 

Summary of the Invention 

In a first aspect the invention provides a nucleic acid sequence encoding a polypeptide 
having starch branching enzyme (SBE) activity, the polypeptide comprising an effective 
portion of the amino acid sequences shown in Figure 4 (SEP. ID. NO. 291 or Figure 13 
(SEQ. ID. NO. 31) . The nucleic acid is conveniently in substantial isolation, especially in 
isolation from other naturally associated nucleic acid sequences. 

An "effective portion" of the amino acid sequences may be defined as a portion which 
retains sufficient SBE activity when expressed in E. coli KV832 to complement the 
branching enzyme mutation therein. The amino acid sequences shown in Figures 4 (SEP. 
m - NQ - 29 ) and 13 (SEP. ID. NO. 3H include the N terminal transit peptide, which 
comprises about the first 50 amino acid residues. As those skilled in the art will be well 
aware, such a transit peptide is not essential for SBE activity. Thus the mature 
polypeptide, lacking a transit peptide, may be considered as one example of an effective 
portion of the amino acid sequence shown in Figure 4 (SEP. ID. NO. 291 or Figure 13 
(SEP. ID. NO. 3n 
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Other effective portions may be obtained by effecting minor deletions in the amino acid 
sequence, whilst substantially preserving SBE activity. Comparison with known class A 
SBE sequences, with the benefit of the disclosure herein, will enable those skilled in the art 
to identify regions of the polypeptide which are less well conserved and so amenable to 
minor deletion, or amino acid substitution (particularly, conservative amino acid 
substitution) whilst substantially preserving SBE activity. Such less well-conserved 
regions are generally found in the N terminal amino acid residues (up to the triple proline 
"elbow" at residues 138-140 in Figure 4 (SEP. ID. NO. 29) and up to the proline elbow at 
residues 143-145 in Figure 13 (SEP. ID. NO. 29) ) and in the last 50 residues or so of the C 
terminal, and in particular in the acidic tail of the C terminal. 

Conveniently the nucleic acid sequence is obtainable from cassava, preferably obtained 
therefrom, and typically encodes a polypeptide obtainable from cassava. In a particular 
embodiment, the encoded polypeptide may have the amino acid sequence NSKH at about 
position 697 (in relation to Figure 4 (SEP. ID. NP. 29) ). which sequence appears peculiar 
to an isoform of the SBE class A enzyme of cassava, other class A SBE enzymes having 
the conserved sequence DA D/E Y (Burton et al, 1995 cited above). 

In a particular aspect of the invention there is provided a nucleic acid comprising a portion 
of nucleotides 21 to 2531 of the nucleic acid sequence shown in Figure 4 (SEP. ID. NP. 
28), or a functionally equivalent nucleic acid sequence. Such functionally equivalent 
nucleic acid sequences include, but are not limited to, those sequences which encode 
substantially the same amino acid sequence but which differ in nucleotide sequence from 
that shown in Figure 4 (SEP. ID. NP. 281 hv virtue of the degeneracy of the genetic code. 
For example, a nucleic acid sequence may be altered (e.g. "codon optimised") for 
expression in a host other than cassava, such that the nucleotide sequence differs 
substantially whilst the amino acid sequence of the encoded polypeptide is unchanged. 
Pther functionally equivalent nucleic acid sequences are those which will hybridise under 
stringent hybridisation conditions (e.g. as described by Sambrook et al, Molecular 
Cloning. A Laboratory Manual, CSH, i.e. washing with O.lxSSC, 0.5% SDS at 68°C) with 



the sequence shown in Figure 4 (SEP. ID. NO. 28V Figure 10 shows a functionally 
equivalent sequence designated "125 + 94", which includes a region corresponding to the 
3 'coding portion of the sequence in Figure 4 (SEP. ID. NO. 28) Figure 13 (SEP. ID. NO. 
30) shows a functionally equivalent sequence which comprises a second complete SBE 
coding sequence (the SBE-derived sequence is from nucleotides 35 to 2760, of which the 
coding sequence is nucleotides 131-2677, the rest of the sequence in the figure is vector- 
derived). 

Functionally equivalent DNA sequences will preferably comprise at least 200-300bp, more 
preferably 300-600bp, and will exhibit at least 88% identity (more preferably at least 90%, 
and most preferably at least 95% identity) with the corresponding region of the DNA 
sequence shown in figures 4 (SEP. ID. NP. 28^ or 10. Those skilled in the art will readily 
be able to conduct a sequence alignment between the putative functionally equivalent 
sequence and those detailed in Figures 4 (SEP. ID. NO. 28^ or 10 - the identity of the two 
sequences is to be compared in those regions which are aligned by standard computer 
software, which aligns corresponding regions of the sequences. 

In particular embodiments the nucleic acid sequence may alternatively comprise a 5' and/or 
a 3' untranslated region ("UTR"), examples of which are shown in Figures 2 and 4 (SEP. 
m NQ - 28 ) - Fi S ure 9 includes a 3' UTR, as nucleotides 688-1044 and Figure 10 includes 
3' UTR as nucleotides 1507-1900 (which nucleotides correspond to the first base after the 
"stop" codon to the base immediately preceding the poly (A) tail). Any one of the 
sequences defined above, or a functional equivalent thereof (as defined by hybridisation 
properties, as set out in the preceding paragraph), could be useful in sense or anti-sense 
inhibition of corresponding genes, as will be apparent to those skilled in the art. It will 
also be apparent to those skilled in the art that such regions may be modified so as to 
optimise expression in a particular type of host cell and that the 5' and/or 3' UTRs could be 
used in isolation, or in combination with a coding portion of the sequence of the invention. 
Similarly, a coding portion could be used without a 5' or a 3 'UTR if desired. 

In a further aspect, the invention provides a replicable nucleic acid construct comprising 
any one of the nucleic acid sequences defined above. The construct will typically comprise 
a selectable marker and may allow for expression of the nucleic acid sequence of the 
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invention. Conveniently the vector will comprise a promoter (especially a promoter 
sequence operable in a plant and/or a promoter operable in a bacterial cell) and one or 
more regulatory signals known to those skilled in the art. 

In another aspect the invention provides a polypeptide having SBE activity, the polypeptide 
comprising an effective portion of the amino acid sequence shown in Figure 4 (SEP. ID. 
NO. 29) or Figure 13 (SEP. ID. NO. 3D . The polypeptide is conveniently one obtainable 
from cassava, although it may be derived using recombinant DNA techniques. The 
polypeptide is preferably in substantial isolation from other polypeptides of plant origin, 
and more preferably in substantial isolation from any other polypeptides. The polypeptide 
may have amino acid residues NSKH at about position 697 (in the sequence shown in 
Figure 4 (SEP. ID. NP. 29) ). instead of the sequence DA D/E Y found in other SBE class 
A polypeptides. The polypeptide may be used in a method of modifying starch in vitro, the 
method comprising treating starch under suitable conditions (of temperature, pH etc.) with 
an effective amount of the polypeptide. 

Those skilled in the art will appreciate that the disclosure of the present specification can 
be utilised in a number of ways. In particular, the characteristics of a host cell may be 
altered by recombinant DNA techniques. Thus, in a further aspect, there is provided a 
method by which a host cell may be altered by introduction of a nucleic acid sequence 
comprising at least 200bp and exhibiting at least 88% sequence identity (more preferably at 
least 90%, and most preferably at least 95% identity) with the corresponding region of the 
DNA sequence shown in Figures 4 (SEP. ID. NP. 28) . 9, 10 or 13 (SEP. ID. NP. 31) . 
operably linked in the sense or (preferably) in the anti-sense orientation to a suitable 
promoter active in the host cell, and causing transcription of the introduced nucleic acid 
sequence, said transcript and/or the translation product thereof being sufficient to interfere 
with the expression of a homologous gene naturally present in said host cell, which 
homologous gene encodes a polypeptide having SBE activity. The altered host cell is 
typically a plant cell, such as a cell of a cassava, banana, potato, sweet potato, tomato, pea, 
wheat, barley, oat, maize, or rice plant. 

Desirably the method further comprises the introduction of one or more nucleic acid 
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sequences which are effective in interfering with the expression of other homologous gene 
or genes naturally present in the host cell. Such other genes whose expression is inhibited 
may be involved in starch biosynthesis (e.g. an SBE I gene), or may be unrelated to SBE H 

Those skilled in the art will be aware that both anti-sense inhibition, and "sense 
suppression" of expression of genes, especially plant genes, has been demonstrated (e.g. 
Matzke & Matzke 1995 Plant Physiol. 107, 679-685). 

It is believed that antisense methods are mainly operable by the production of antisense 
mRNA which hybridises to the sense mRNA, preventing its translation into functional 
polypeptide, possibly by causing the hybrid RNA to be degraded (e.g. Sheehy et al, 1988 
PNAS 85, 8805-8809; Van der Krol et al, Mol. Gen. Genet. 220, 204-212). Sense 
suppression also requires homology between the introduced sequence and the target gene, 
but the exact mechanism is unclear. It is apparent however that, in relation to both 
antisense and sense suppression, neither a full length nucleotide sequence, nor a "native" 
sequence is essential. Preferably the nucleic acid sequence used in the method will 
comprise at least 200-300bp, more preferably at least 300-600bp, of the full length 
sequence, but by simple trial and error other fragments (smaller or larger) may be found 
which are functional in altering the characteristics of the plant. It is also known that 
untranslated portions of sequence can suffice to inhibit expression of the homologous gene 
- coding portions may be present within the introduced sequence, but they do not appear to 
be essential under all circumstances. 

The inventors have discovered that there are at least two class A SBE genes in cassava. A 
fragment of a second gene has been isolated, which fragment directs the expression of the 
C terminal 481 amino acids of cassava class A SBE (see Figure 10) and comprises a 3' 
untranslated region. Subsequently, a complete clone of the second gene was also 
recovered (see Figure 12). The coding portions of the two genes show some slight 
differences, and the second SBE gene may be considered as functionally equivalent to the 
corresponding portion of the nucleotide sequence shown in Figure 4 (SEP. ID. NO. 28V 
However, the 3' untranslated regions of the two genes show marked differences. Thus the 
method of altering a host cell may comprise the use of a sufficient portion of either gene so 
as to inhibit the expression of the naturally occurring homologous gene. Conveniently, a 
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portion of nucleotide sequence is employed which is conserved between both genes. 
Alternatively, sufficient portions of both genes may be employed, typically using a single 
construct to direct the transcription of both introduced sequences. 

In addition, as explained above, it may be desired to cause inhibition of expression of the 
class B SBE (i.e. SBE I) in the same host cell. A number of class B SBE gene sequences 
are known, including portions of the cassava class B SBE (Salehuzzaman et aL, 1994 Plant 
Science 98, 53-62) and any one of these may prove suitable. Preferably the sequence used 
is that which derives from the host cell sought to be altered (e.g. when altering the 
characteristics of a cassava plant cell, it is generally preferred to use sense or anti-sense 
sequences corresponding exactly to at least portions of the cassava gene whose expression 
is sought to be inhibited). 

In a further aspect the invention provides an altered host cell, into which has been 
introduced a nucleic acid sequence comprising at least 200bp and exhibiting at least 88% 
sequence identity (more preferably at least 90%, and most preferably at least 95% identity) 
with the corresponding region of the DNA sequence shown in Figures 4 (SEP. ID. NO. 
28), 9, 10 or 13 (SEP. ID. NO. 3R operably linked in the sense or anti-sense orientation to 
a suitable promoter, said host cell comprising a natural gene sharing sequence homology 
with the introduced sequence. 

The host cell may be a micro-organism (such as a bacterial, fungal or yeast cell) or a plant 
cell. Conveniently the host cell altered by the method is a cell of a cassava plant, or 
another plant with starch storage reserves, such as banana, potato, sweet potato, tomato, 
pea, wheat, barley, oat, maize, or rice plant. Typically the sequence will be introduced in a 
nucleic acid construct, by way of transformation, transduction, micro-injection or other 
method known to those skilled in the art. The invention also provides for a plant into 
which has been introduced a nucleic acid sequence of the invention, or the progeny of such 
a plant. 

The altered plant cell will preferably be grown into an altered plant, using techniques of 
plant growth and cultivation well-known to those skilled in the art of re-generating 



plantlets from plant cells. 
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The invention also provides a method of obtaining starch from an altered plant, the plant 
being obtained by the method defined above. Starch may be extracted from the plant by 
any of the known techniques (e.g. milling). The invention further provides starch 
obtainable from a plant altered by the method defined above, the starch having altered 
properties compared to starch extracted from an equivalent but unaltered plant. 
Conveniently the altered starch is obtained from an altered plant selected from the group 
consisting of cassava, potato, pea, tomato, maize, wheat, barley, oat, sweet potato and rice. 
Typically the altered starch will have increased amylose content. 

The invention will now be further described by way of illustrative examples and with 
reference to the accompanying drawings, in which:- 

Figure 1 is a schematic illustration of the cloning strategy for cassava SBE n. The top line 
represents the size of a full length clone with distances in kilobases (kb) and arrows 
representing oligonucleotides (rightward pointing arrows are sense strand, leftward are on 
opposite strand). The long thick arrow is the open reading frame with start and stop 
codons shown. Below this are shown the 3' RACE, 5' RACE and PCR clones identified 
either by the plasmid name (shown in brackets above the line) or the clone number (shown 
to the left of the clone) for the 5' RACE only. Also shown (by an x) in the 5' RACE clones 
are positions of small deletions or introns. 



Figure 2 shows the DNA sequence and predicted ORF of csbe2con.seq. This sequence is a 
consensus of 3' RACE pSJ94 and 5' RACE clones 27/9,11 and 28. The first 64 base pairs 
are derived from the RoRidT17 adaptor primer/dT tail followed by the SBE sequence. The 
one long open reading frame is shown in one letter code below the double strand DNA 
sequence. Also shown is the upstream ORF (MQL...LPW). 

Figure 3 shows an alignment of the 5' region of cassava SBE H csbe2con and pSJ99 
(clones 20 and 35) DNA sequences. Differences from the consensus sequence are shaded. 
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Figure 4 shows the DNA sequence and predicted ORF of full length cassava SBE E tuber 
cDNA in pSJ107. The sequence shown is from the CSBE214 (SEP. ID. NO. 15) to the 
CSBE218 (SEP. ID. NO. 191 oligonucleotide. The DNA sequence is sequence ID No. 28 
in the attached sequence listing; the amino acid sequence is Seq ID No. 29. 

Figure 5 shows an alignment of 3' region of cassava SBE II pSJ116 and 125+94 DNA 
sequences. The top line is the 125 + 94 sequence and the bottom SJ116 sequence. 
Identical nucleotides are indicated by the same letter in the middle line, differences are 
indicated by a gap, and dashed lines indicate gaps introduced to optimise alignment. 

Figure 6 shows an alignment of carboxy terminal region of pSJ116 and 125+94 protein 
sequences. The top sequence is from 125+94 and the bottom from pSJ116. Identical 
amino acid residues are shown with the same letter, conserved changes with a colon and 
neutral changes with a period. 

Figure 7 shows a phylogenetic tree of starch branching enzyme proteins. The length of 
each pair of branches represents the distance between sequence pairs. The scale beneath the 
tree measures the distance between sequences (units indicate the number of substitution 
events). Dotted lines indicate a negative branch length because of averaging the tree. 
Zmconl2.pro is maize SBE E, psstbl.pro is pea SBE I (Bhattacharyya et al 1990 Cell 60, 
115-121) and atsbe2-l & 2-2.pro are two SBE E proteins from Arabidopsis thalania 
(Fisher et al 1996 Plant Mol. Biol. 30, 97-108). SJ107.pro is representative of a cassava 
SBE H sequence, and potsbe2.pro is a potato SBE II sequence known to the inventors. 

Figure 8 is an alignment of SBE E proteins. Protein sequences are indicated in one letter 
code. The top line represents the consensus sequence, below which is shown the 
consensus ruler and the individual SBE E sequences. Residues matching the consensus are 
shaded. Dashes represent gaps introduced to optimise alignment. Sequence identities are 
shown at the right of the figure and are as Figure 7, except that SJ107.pro is cassava SBE 

n. 
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Figure 9 shows the DNA sequence and predicted ORF of a cassava SBE H cDNA isolated 
by 3' RACE (plasmid pSJ 101). 

Figure 10 shows the consensus DNA sequence and predicted ORF of a second cassava 
SBE H cDNA isolated by 3' and 5' RACE (sequence designated 125+94 is from plasmid 
pSJ125 and pSJ94, spliced at the CSBE217, SEP. ID. NO. 1 8. oligo sequence). 

Figure 11 is a schematic diagram of the plant transformation vector pSJ64. The black line 
represents the DNA sequence. The hashed line represents the bacterial plasmid backbone 
(containing the origin of replication and bacterial selection marker) and is not shown in 
full. The filled triangles represent the T-DNA borders (RB = right border, LB = left 
border). Relevant restriction enzyme sites are shown above the black line with the 
approximate distances (in kiloobases) betwen sites marked by an asterisk shown 
underneath. The thinnest arrows represent polyadenylation signals (pAnos = nopaline 
synthase, pAg7 = Agrobacterium gene 7), the intermediate arrows represent protein coding 
regions (SBE H = cassava SBE II, HYG = hygromycin resistance gene) and the thick 
arrows represent promoter regions (P-2x35S = double CaMV 35S promoter, P-nos = 
nopaline synthase promoter). 

Figure 12 is a schematic illustration of the cloning strategy used to isolate a second cassava 
SBE II gene. The top line represents the size of a full length clone with distances in 
kilobases (kb) and arrows representing oligonucleotides (rightward pointing arrows are 
sense strand, leftward are on opposite strand). The long thick arrow is the open reading 
frame with start and stop codons shown. Below this are shown the 3 'RACE, 5 'RACE and 
PCR clones identified either by the plasmid name (shown in brackets above the line) or the 
clone number (shown to the right of the clone). 

Figure 13 shows the DNA sequence and predicted ORF of a second full length cassava 
SBE H tuber cDNA in pSJ146. Nucleotides 35-2760 are SBE H sequence and the 
remainder are from the pT7Blue vector. The DNA sequence of Figure 13 is Seq ID No. 
30, and the amino acid sequence is Seq ID No. 31, in the attached sequence listing. 
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Example 1 

This example relates to the isolation and cloning of SBE H sequences from 



Recombinant DNA manipulations 

Standard procedures were performed essentially according to Sambrook et al. (1989 
Molecular cloning A laboratory manual, 2nd edn. Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, N.Y.). DNA sequencing was performed on an ABI automated DNA 
sequencer and sequences manipulated using DNASTAR software for the Macintosh. 

Rapid Amplification of c DNA ends ("RACE) and PCR conditions 

5' and 3' RACE were performed essentially according to Frohman et al, (1988 Proc. Natl. 
Acad. Sci. USA 85, 8998-9002) but with the following modifications. 

For 3' RACE, 5 ug of total RNA was reverse transcribed using 5 pmol of the RACE 
adaptor RoRidT17 as primer and Stratascript RNAse H- reverse transcriptase (50 U) in a 
50 ul reaction according to the manufacturer's instructions (Stratagene). The reaction was 
incubated for 1 hour at 37°C and then diluted to 200 ul with TE (10 mM Tris HC1, 1 mM 
EDTA) pH 8 and stored at 4°C. 2.5 ul of this cDNA was used in a 25 ul PCR reaction 
with 12.5 pmol of SBE A (SEP. ID. NO. n and Ro primers for 30 cycles of 94°C 45 sec, 
50°C 25 sec, 72°C 1 min 30 sec. A second round of PCR (25 cycles) was performed using 
1 ul of this reaction as template in a 50 ul reaction under the same conditions. Amplified 
products were separated by agarose gel electrophoresis and cloned into the pT7Blue vector 
(Invitrogen). 

For the first round of 5 'RACE, 5 ug of total leaf RNA was reverse transcribed as described 
above using 10 pmol of the SBE H gene specific primer CSBE22 (SEP. ID. NO 31 This 
primer was removed from the reaction by diluting to 500 ul with TE and centrifuging twice 
through a centricon 100 microconcentrator. The concentrated cDNA was then dA-tailed 
with 9U of terminal deoxynucleotide transferase and 50 uM dATP in a 20 ul reaction in 
buffer supplied by the manufacturer (BRL). The reaction was incubated for 10 min at 
37°C and 5 min at 65°C and then diluted to 200 ul with TE pH 8. PCR was performed in a 
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50 nl volume using 5ul of tailed cDNA, 2.5 pmol of RoRidTH and 25 pmol of Ro and 
CSBE24 (SEQ. ID. NO. 5) primers for 30 cycles of 94°C 45 sec, 55°C 25 sec, 72°C 3 min. 
Amplified products were separated on a 1% TAE agarose gel, cut out, 200ul of TE was 
added and melted at 99°C for 10 min. Five ul of this was re-amplified in a 50 ul volume 
using CSBE25 (SEP. ID. NO. 61 and Ri as primers and 25 cycles of 94°C 45 sec, 55°C 25 
sec, 72°C 1 min 30 sec. Amplified fragments were separated on a 1% TAE agarose gel, 
purified on DEAE paper and cloned into pT7Blue. 

The second round of 5' RACE was performed using CSBE28 (SEP. ID. NO. 91 and 29 
primers in the first and second round PCR reactions respectively using a new A-tailed 
cDNA library primed with CSBE27 (SEP. ID. NP. 81 . 

A third round of 5' RACE was performed on the same CSBE27 (SEP. ID. NO. 81 primed 
cDNA . 

Repeat 3' RACE and PCR Cloning 

The 3' RACE library (RoRidT17 primed leaf RNA) was used as a template. The first PCR 
reaction was diluted 1:20 and 1 ul was used in a 50 ul PCR reaction with SBE A (SEP. ID. 
NQ - *) and Ri primers and the products were cloned into pT7Blue. The cloned PCR 
products were screened for the presence or absence of the CSBE23 (SEP. ID. NP. 41 oligo 
by colony PCR. 

A full length cDNA of cassava SBE H was isolated by PCR from leaf or root cDNA 
(RoRidT17 primed) using primers CSBE214 (SEP. ID. NO. 151 and CSBE218 (SEP. ID. 
N0 - 19 ) from 2 -5 Hi of cDNA in a 25 ul reaction and 30 cycles of 94°C 45 sec, 55°C 25 
sec, 72°C 2 min. 

Complementation of E. coli mutant KV832 

SBE n containing plasmids were transformed into the branching enzyme deficient mutant 
E coli KV832 (Keil et al, 1987 Mol. Gen. Genet. 207, 294-301) and cells grown on solid 
PYG media (0.85 % KH 2 PG 4 , 1.1 % K 2 HPP 4) 0.6 % yeast extract) containing 1.0 % 
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glucose. To test for complementation, a loop of cells was scraped off and resuspended in 
150 uL water to which was added 15 uL of Lugol's solution (2 g KI and 1 g I 2 per 300 ml 
water). 

RNA isolation 

RNA was isolated from cassava plants by the method of Logemann (1987 Anal. Biochem. 
163, 21-26). Leaf RNA was isolated from 0.5 gm of in vitro grown plant tissue. The total 
yield was 300 \ig. Three month old roots (88 gm) were used for isolation of root RNA). 



SBE II specific oligonucleotides 






A 1 GGAG A ACjGAT atgtatga 


(Seq ID No. 1) 




OU 111 LA 1 UAL 1 1 G 1 GAGC A 


(Seq ID No. 2) 




TPtPTP A A A r^Tr^ a to a a a oo 

1 UL 1 LAUAAu 1 LA 1 G A AAGG 


(Seq ID No. 3) 




1 LLAu 1 G 1 G AA TATACGTCG 


(Seq ID No. 4) 




AUUAU 1 AGA 1 GGTCTGTCGA 


(Seq ID No. 5) 


(_I>Bb25 


TCATACATATCCTTGTCCAT 


(Seq ID No. 6) 


CSBE26 


GGGTGACTTCAATGATGTAC 


(Seq ID No. 7) 


CSBE27 


GGTGTACATCATTGAAGTCA 


(Seq ID No. 8) 


CSBE28 


AATTACTGGCTCCGTACTAC 


(Seq ID No. 9) 


CSBE29 


CATTCCAACGTGCGACTCAT 


(Seq ID No. 10) 


CSBE210 


TACCGGTAATCTAGGTGTTG 


(Seq ID No. 11) 


CSBE211 


GGACCTTGGTTTAGATCCAA 


(Seq ID No. 12) 


CSBE212 


ATGAGTCGCACGTTGGAATG 


(Seq ID No. 13) 


CSBE213 


CAACACCTAGATTACCGGTA 


(Seq ID No. 14) 


CSBE214 


TTAGTTGCGTCAGTTCTCAC 


(Seq ID No. 15) 


CSBE215 


AATATCTATCTCAGCCGGAG 


(Seq ID No. 16) 


CSBE216 


ATCTTAGATAGTCTGCATCA 


(Seq ID No. 17) 


CSBE217 


TGGTTGTTCCCTGGAATTAC 


(Seq ID No. 18) 


CSBE218 


TGCAAGGACCGTGACATCAA 


(Seq ID No. 19) 



RESULTS 
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Cloning of a SBE n gene from cassava leaf 

The strategy for cloning a full length cDNA of starch branching enzyme II of cassava is 
shown in Figure 1. A comparison of several SBE II (class A) SBE DNA sequences 
identified a 23 bp region which appears to be completely conserved among most genes 
(data not shown) and is positioned about one kilobase upstream from the 3' end of the 
gene. An oligonucleotide primer (designated SBE A, SEP. ID. NO. I ) was made to this 
sequence and used to isolate a partial cDNA clone by 3' RACE PCR from first strand leaf 
cDNA as illustrated in Figure 1. An approximately 1100 bp band was amplified, cloned 
into pT7Blue vector and sequenced. This clone was designated pSJ94 and contained a 
1120 bp insert starting with the SBE A (SEP. ID. NO. I) oligo and ending with a polyA 
tail. There was a predicted open reading frame of 235 amino acids which was highly 
homologous (79% identical) to a potato SBE II also isolated by the inventors (data not 
shown) suggesting that this clone represented a class A (SBE II) gene. 
To obtain the sequence of a full length clone nested primers were made complementary to 
the 5' end of this sequence and used in 5' RACE PCR to isolate clones from the 5' region of 
the gene. A total of three rounds of 5' RACE was needed to determine the sequence of the 
complete gene (i.e. one that has a predicted long PRF preceded by stop codons). It should 
be noted that during this cloning process several clones (# 23, 9, 16) were obtained that had 
small deletions and in one case (clone 23) there was also a small (120 bp) intron present. 
These occurrences are not uncommon and probably arise through errors in the PCR process 
and/or reverse transcription of incompletely processed RNA (heterogeneous nuclear RNA). 

The overlapping cDNA fragments could be assembled into a contiguous 3 kb sequence 
(designated csbe2con.seq) which contained one long predicted PRF as shown in Figure 2. 
Several clones in the last round of 5' RACE were obtained which included sequence of the 
untranslated leader (UTL). All of these clones had an PRF (42 amino acids) 46 bp 
upstream and out of frame with that of the long PRF. 

There is more than one SBE II gene in cassava 

In order to determine if the assembled sequence represented that of a single gene, attempts 
were made to recover by PCR a full length SBE H gene using primers CSBE214 (SEP. ID. 
NO. 15) and CSBE23 (SEP. ID. NO. 4) at the 5' and 3' ends of the csbe2con sequence 
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respectively. All attempts were unsuccessful using either leaf or root cDNA as template. 
The PCR was therefore repeated with either the 5'- or 3- most primer and complementary 
primers along the length of the SBE H gene to determine the size of the largest fragment 
that could be amplified. With the CSBE214 (SEP. ID. NO. 151 primer, fragments could be 
amplified using primers 210, 28, 27 and 22 in order of increasing distance, the latter primer 
pair amplifying a 2.2 kb band. With the 3' primer CSBE23 (SEP. ID. NO. 41 only primer 
pairs with 21 and 26 gave amplification products, the latter being about 1200 bp. These 
results suggest that the original 3' RACE clone (pSJ94) is derived from a different SBE H 
gene than the rest of the 5' RACE clones even though the two largest PCR fragments 
(214+22 and 26+23) overlap by 750 bp and share several primer sites. It is likely that the 
sequence of the two genes starts to diverge around the CSBE22 (SEP. ID. NO 31 primer 
site such that the 3 'end of the corresponding gene does not contain the 23 primer and is not 
therefore able to amplify a cDNA when used with the 214 primer. 

To confirm this, the sequence of the longest 5' PCR fragment (214+22) from two clones 
(#20 designated pSJ99, & #35) was determined and compared to the consensus sequence 
csbe2con as shown in Figure 3. The first 2000 bases are nearly identical (the single base 
changes might well be PCR errors), however the consensus sequence is significantly 
different after this. This region corresponds to the original 3' RACE fragment pSJ94 (SBE 
A + Ri adaptor) and provided evidence that there may be more than one SBE H gene in 



cassava. 



The 3' end corresponding to pSJ99 was therefore cloned as follows: 3' RACE PCR was 
performed on leaf cDNA using the SBE A oligo as the gene specific primer so that all SBE 
H genes would be amplified. The cloned DNA fragments were then screened for the 
presence or absence of the CSBE23 (SEP. ID. NO. 41 primer by PCR. Two out of 15 
clones were positive with the SBE A + Ri primer pair but negative with SBE A + CSBE23 
(SEQ. ID. NO. 4) primers. The sequence of these two clones (designated pSJIOl, as 
shown in Figure 9) demonstrated that they were indeed from an SBE n gene and that they 
were different from pSJ94. However the overlapping region of pSJIOl (the 3' clone) and 
pSJ99 (the 5' clone) was identical suggesting that they were derived from the same gene. 



To confirm this a primer (CSBE218, SEP. ID. NP. 19 1 was made to a region in the 3' UTR 
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(untranslated region) of pSJIOl and used in combination with CSBE214 (SEP. ID. NO. 
15} primer to recover by PCR a full length cDNA from both leaf and root cDNA. These 
clones were sequenced and designated pSJ106 & pSJ107 respectively. The sequence and 
predicted ORF of pSJ107 is shown in Figure 4 (SEQ. ID. NO. 28). The long ORF in 
plasmid pSJ106 was found to be interrupted by a stop codon (presumably introduced in the 
PCR process) approximately 1 kb from the 3' end of the gene, therefore another cDNA 
clone (designated pSJ116) was amplified in a separate reaction, cloned and sequenced. 
This clone had an intact ORF (data not shown). 

There were only a few differences in these two sequences (in the transit peptide aa 27- 41: 
YRRTS S CLSFNFKEA to DRRTS SCLSFTFKKAA and L831 in pSJ107 to V in pSJ116 
respectively). 



An additional 740bp of sequence of the gene corresponding to the pSJ94 clone was 
isolated by 5' RACE using the primers CSBE216 (SEP. ID. NO. 17^ and 217, and was 
designated pSJ125. This sequence was combined with that of pSJ94 to form a consensus 
sequence "125 + 94", as shown in Figure 10. The sequence of this second gene is about 
90% identical at the DNA and protein level to pSJ116, as shown in Figure 5 and 6, and is 
clearly a second form of SBE H in cassava. The 3' untranslated regions of the two genes 
are not related (data not shown). 

It was also determined that the full length cassava SBE H genes (from both leaf and tuber) 
actually encode for active starch branching enzymes since the cloned genes were able to 
complement the glycogen branching enzyme deficient E. coli mutant KV832. 

Main Findings 

1) A full length cDNA clone of a starch branching enzyme II (SBE II) gene has been 
cloned from leaves and starch storing roots of cassava. This cDNA encodes a 836 amino 
acid protein (Mr 95 Kd) and is 86 % identical to pea SBE I over the central conserved 
domain, although the level of sequence identity over the entire coding region is lower than 
86%. 



2) There is more than one SBE II gene in cassava as a second partial SBE H cDNA was 
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isolated which differs slightly in the protein coding region from the first gene and has no 
homology in the 3' untranslated region. 

3) The isolated full length cDNA from both leaves and roots encodes an active SBE as it 
complements an E. coli mutant deficient in glycogen branching enzyme as assayed by 
iodine staining. 

We have shown that there are SBE H (Class A) gene sequences present in the cassava 
genome by isolating cDNA fragments using 3' and 5' RACE. From these cDNA fragments 
a consensus sequence of over 3 kb could be compiled which contained one long open 
reading frame (Figure 2) which is highly homologous to other SBE H (class A) genes (data 
not shown). It is likely that the consensus sequence does not represent that of a single gene 
since attempts to PCR a full length gene using primers at the 5' and 3' ends of this sequence 
were not successful. In fact screening of a number of leaf derived 3' RACE cDNAs showed 
that a second SBE n gene (clone designated pSJIOl) was also expressed which is highly 
homologous within the coding region to the originally isolated cDNA (pSJ94) but has a 
different 3' UTR. A full length SBE H gene was isolated from leaves and roots by PCR 
using a new primer to the 3' end of this sequence and the original sequence at the 5' end of 
the consensus sequence. If the frequency of clones isolated by 3' RACE PCR reflects the 
abundance of the mRNA levels then this full length gene may be expressed at lower levels 
in the leaf than the pSJ94 clone (2 out of 15 were the former class, 13/15 the latter). It 
should be noted that each class is expressed in both leaves and roots as judged by PCR 
(data not shown). Sequence analysis of the predicted ORF of the leaf and root genes 
showed only a few differences (4 amino acid changes and one deletion) which could have 
arisen through PCR errors or, alternatively, there may be more than one nearly identical 
gene expressed in these tissues. 

A comparison of all known SBE H protein sequences shows that the cassava SBE H gene is 
most closely related to the pea gene (Figure 8). The two proteins are 86.3% identical over 
a 686 amino acid range which extends from the triple proline "elbow" (Burton et al, 1995 
Plant J. 7, 3-15) to the conserved VVYA sequence immediately preceding the C-terminal 
extensions (data not shown). All SBE H proteins are conserved over this range in that they 
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are at least 80% similar to each other. Remarkably however, the sequence conservation 
between the pea, potato and cassava SBE H proteins also extends to the N-terminal transit 
peptide, especially the first 12 amino acids of the precursor protein and the region 
surrounding the mature terminus of the pea protein (AKFSRDS). Because the proteins are 
so similar around this region it can be predicted that the mature terminus of the cassava 
SBE H protein is likely to be GKSSHES. The precursor has a predicted molecular mass of 
96 kD and the mature protein a predicted molecule mass of 91.3 kD. The cassava SBE U 
has a short acidic tail at the C-terminal although this is not as long or as acidic as that 
found in the pea or potato proteins. The significance of this acidic tail, if any, remains to 
be determined. One notable difference between the amino acid sequence of cassava SBE II 
and all other SBE H proteins is the presence of the sequence NSKH at around position 697 
instead of the conserved sequence DAD/EY. Although this conserved region forms part of 
a predicted a-helix (number 8) of the catalytic (fi/a) 8 barrel domain (Burton et al 1995 
cited previously), this difference does not abolish the SBE activity of the cassava protein as 
this gene can still complement the glycogen branching deletion mutant of E. coll It may 
however affect the specificity of the protein. An interesting point is that the other cassava 
SBE II clone pSJ94 has the conserved sequence DADY. 

One other point of interest concerning the sequence of the SBE H gene is the presence of 
an upstream ATG in the 5' UTR. This ATG could initiate a small peptide of 42 amino 
acids which would terminate downstream of the predicted initiating methionine codon of 
the SBE II precursor. If this does occur then the translation of the SBE H protein from this 
mRNA is likely to be inefficient as ribosomes normally initiate at the 5' most ATG in the 
mRNA. However the first ATG is in a poorer Kozak context than the SBE H initiator and 
it may be too close to the 5' end of the message to initiate efficiently (14 nucleotides) thus 
allowing initiation to occur at the correct ATG. 

In conclusion we have shown that cassava does have SBE II gene sequences, that they are 
expressed in both leaves and tubers and that more than one gene exists. 



Example 2 
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Clonine of a second full length cassava SHE TT 


gene 


Methods 






Oligonucleotides 




CSBE219 


CTTTATCTATTA A AriAPTTr 


(,oeq id jno. zUJ 


CSBE220 


CAAAAAAGTTTGTGACATGG 


(oeq ID JNo. 21) 


CSBE221 


TCACTTTTTCCAATGCTAAT 


{ocq ID No. 22) 


CSBE222 


TCTCATGCAATGGAACCGAC 


(oeq ID No. 23) 


CSBE223 


CAGATGTCCTGACTCGGAAT 


(oeq ID No. 24) 


CSBE224 


ATTCCGAGTCAGGACATCTG 


(Seq ID No. 25) 


CSBE225 


CGCATTTCTCGCTATTGCTT 


(Seq ID No. 26) 


CSBE226 


CACAGGCCCAAGTGAAGAAT 


(Seq ID No. 27) 



The 5' end of the gene corresponding to the 3RACE clone pSJ94 was isolated in three 
rounds of 5 RACE. Prior to performing the first round of 5' RACE, 5 ug of total leaf RNA 
was reverse transcribed in a 20 \i\ reaction using conditions as decribed by the 
manufacturer (Superscript enzyme, BRL) and 10 pmol of the SBE H gene specific primer 
CSBE23 (SEP. ID. NO. 41 Primers were then removed and the cDNA tailed with dATP 
as described above. The first round of 5 RACE used primers CSBE216 (SEP. ID. NO. 11) 
and Ro. This PCR reaction was diluted 1:20 and used as a template for a second round of 
amplification using primers CSBE217 (SEP. ID. NO. 18^ and Ri. The gene specific 
primers were designed so that they would preferentially hybridise to the SBE H sequence in 
pSJ94. Amplified products appeared as a smear of approximately 600-1200 bp when 
subjected to electrophoresis on a 1% TAE agarose gel. 

This smear was excised and DNA purified using a Qiaquick column (Qiagen) before 
ligation to the pT7Blue vector. Several clones were sequenced and clone #7 was 
designated pSJ125. New primers (CSBE219 (SEP. ID. NO. 20) and 220 (SEP. ID. NO. 
21}) were designed to hybridise to the 5' end of pSJ125 and a second round of 5 RACE was 
performed using the same CSBE23 (SEP. ED. NP. 4^1 primed library. Two fragments of 
600 and 800 bp were cloned and sequenced (clones 13,17). Primers CSBE221 (SEP. ID. 
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N022) and 222 (SEP. ID. NO. 23) were designed to hybridise to the 5' sequence of the 
longest clone (#13) and a third round of 5' RACE was performed on a new library (5 ug 
total leaf RNA reverse transcribed with Superscript using CSBE220 (SEP. ID. NO. 21) as 
primer and then dATP tailed with TdT from Boehringer Mannheim). Fragments of 
approximately 500 bp were amplified, cloned and sequenced. Clone #13, was designated 
pSJ143. The process is illustrated schematically in Figure 12. 

To isolate a full length gene as a contiguous sequence, a new primer (CSBE225, SEP. ID. 
NOJ6) was designed to hybridise to the 5' end of clone pSJ143 and used with one of the 
primers (CSBE226 (SEQ. ID. NP. 27) or 23 (SEP. ID. NO. 4) ) in the 3' end of clone 
pSJ94, in a PCR reaction using RoRidT17 primed leaf cDNA as template. Use of primer 
CSBE226 (SEQ. ID. NP. 27) resulted in production of Clone #2 (designated pSJ144), and 
use of primer CSBE23 (SEQ. ID. NP. 4) resulted in production of Clones #10 and 13 
(designated pSJ145 and pSJ146 respectively). Pnly pSJ146 was sequenced fully. 

Results 

Isolation of a second full length cassava SBE TT g pne 

A full length clone for a second SBE H gene was isolated by extending the sequence of 
pSJ94 in three rounds of 5' RACE as illustrated schematically in Figure 12. In each round 
of 5 'RACE, primers were designed that would preferentially hybridise to the new sequence 
rather than to the gene represented by pSJl 16. In the final round of 5' RACE, three clones 
were obtained that had the initiating methione codon, and none of these had upstream 
ATGs. The overlapping cDNA fragments (sequences of the 5 RACE clones pSJ143, 13, 
PSJ125 and the 3 RACE clone pSJ94) could be assembled into a consensus sequence of 
approximately 3 kb which was designated csbe2-2.seq. This sequence contained one long 
PRF with a predicted size of 848 aa (M r 97 kDa). The full length gene was then isolated 
as a contiguous sequence by PCR amplification from RoRidT17 primed leaf cDNA using 
primers at the 5' (CSBE225, SEP. ID. NP. 26 ) and 3' (CSBE23 (SEP. ID. NP. 4^ or 
CSBE226 (SEQ. ID. NP. 27) ) ends of the RACE clones. Qne clone, designated pSJ146, 
was sequenced and the restriction map is shown along with the predicted amino acid 
sequence in Figure 13 (SEP. ID. NP. 31) 
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Sequence homologies between SBE II genes 

The two cassava genes (pSJ116 and pSJ146) share 88.8% identity at the DNA level over 
the entire coding region (data not shown). The homology extends about 50 bases outside 
of this region but beyond this the untranslated regions show no similarity (data not shown). 
At the protein level the two genes show 86% identity over the entire ORF (data not 
shown). The two genes are more closely related to each other than to any other SBE n. 
Between species, the pea SBE I shows the most homology to the cassava SBE II genes. 

Example 3 

Construction of plant transformation vectors and transformation of cassava with 
antisense starch branching enzyme genes. 

This example describes in detail how a portion of the SBE II gene isolated from cassava 
may be introduced into cassava plants to create transgenic plants with altered properties. 

An 1 100 bp Hind TH - Sac I fragment of cassava SBE II (from plasmid pSJ94) was cloned 
into the Hind JR - Sac I sites of the plant transformation vector pSJ64 (Figure 11). This 
placed the SBE II gene in an antisense orientation between the 2X 35S CaMV promoter 
and the nopaline synthase polyadenylation signal. pSJ64 is a derivative of the binary 
vector pGPTV-HYG (Becker et al, 1992 Plant Molecular Biology 20: 1195-1197) 
modified by inclusion of an approximately 750 bp fragment of pJIT60 (Guerineau et al 
1992 Plant Mol. Biol. 18, 815-818) containing the duplicated cauliflower mosaic virus 
(CaMV) 35S promoter (Cabb-JI strain, equivalent to nucleotides 7040 to 7376 duplicated 
upstream of 7040 to 7433, as described by Frank et al, 1980 Cell 21, 285-294) to replace 
the GUS coding sequence. A similar construct was made with the cassava SBE II 
sequence from plasmid pSJIOl. 

These plasmids are then introduced into Agrobacterium tumefaciens LBA4404 by a direct 
DNA uptake method (An et al, Binary vectors, In: Plant Molecular Biology Manual (ed 
Galvin and Schilperoort) AD 1988 pp 1-19) and can be used to transform cassava somatic 
embryos by selecting on hygromycin as described by Li et al (1996, Nature Biotechnology 
14, 736-740). 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: 

(A) NAME: National Starch and Chemical Investment 

Holding Corporation 

(B) STREET : Suite 27, 501 Silverside Road 

(C) CITY: Wilmington 

(D) STATE: Delaware 

(E) COUNTRY: USA 

(F) POSTAL CODE (ZIP) : 19809 

(ii) TITLE OF INVENTION: Improvements in or Relating to 
Starch Content of 

Plants 



(iii) NUMBER OF SEQUENCES: 31 



(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(EPO) 



(2) INFORMATION FOR SEQ ID NO : 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

ATGGACAAGG ATATGTATGA 
20 



(2) INFORMATION FOR SEQ ID NO : 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2: 
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GGTTTCATGA CTTCTGAGCA 
20 



(2) INFORMATION FOR SEQ ID NO : 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 

TGCTCAGAAG TCATGAAACC 
20 



(2) INFORMATION FOR SEQ ID NO : 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 

TCCAGTCTCA ATATACGTCG 
20 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 

AGGAGTAGAT GGTCTGTCGA 
20 



(2) INFORMATION FOR SEQ ID NO : 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 



26 



(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

TCATACATAT CCTTGTCCAT 
20 



(2) INFORMATION FOR SEQ ID NO : 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GGGTGACTTC AATGATGTAC 
20 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GGTGTACATC ATTGAAGTCA 
20 



(2) INFORMATION FOR SEQ ID NO : 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9: 

AATTACTGGC TCCGTACTAC 
20 



(2) INFORMATION FOR SEQ ID NO: 10: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 

CATTCCAACG TGCGACTCAT 
20 



(2) INFORMATION FOR SEQ ID NO : 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 

TACCGGTAAT CTAGGTGTTG 
20 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 12 

GGACCTTGGT TTAGATCCAA 
20 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 

ATGAGTCGCA CGTTGGAATG 
20 
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(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

CAACACCTAG ATTACCGGTA 
20 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

TTAGTTGCGT CAGTTCTCAC 
20 



(2) INFORMATION FOR SEQ ID NO : 16: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

AATATCTATC TCAGCCGGAG 
20 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
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ATCTTAGATA GTCTGCATCA 
20 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 18 

TGGTTGTTCC CTGGAATTAC 
20 



(2) INFORMATION FOR SEQ ID NO : 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 

TGCAAGGACC GTGACATCAA 
20 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 0 

CTTTATCTAT TAAAGACTTC 
20 



(2) INFORMATION FOR SEQ ID NO: 21: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2 0 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

CAAAAAAGTT TGTGACATGG 
20 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

TCACTTTTTC CAATGCTAAT 
20 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO. 

TCTCATGCAA TGGAACCGAC 
20 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

CAGATGTCCT GACTCGGAAT 
20 

(2) INFORMATION FOR SEQ ID NO: 25: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 25 

ATTCCGAGTC AGGACATCTG 
20 



(2) INFORMATION FOR SEQ ID NO : 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 26 

CGCATTTCTC GCTATTGCTT 
20 



(2) INFORMATION FOR SEQ ID NO : 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

CACAGGCCCA AGTGAAGAAT 
20 



(2) INFORMATION FOR SEQ ID NO : 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2588 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 21 . .2531 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

CTCTCTAACT TCTCAGCGAA ATG GGA CAC TAC ACC ATA TCA GGA ATA 
CGT 5 0 

Met Gly His Tyr Thr lie Ser Gly lie 

Arg 

1 5 

10 

TTT CCT TGT GCT CCA CTC TGC AAA TCT CAA TCT ACC GGC TTC CAT 
GGC 98 

Phe Pro Cys Ala Pro Leu Cys Lys Ser Gin Ser Thr Gly Phe His 
Gly 

15 20 25 



TAT CGG AGG ACC TCC TCT TGC CTT TCC TTC AAC TTC AAG GAG GCG 
TTT 146 

Tyr Arg Arg Thr Ser Ser Cys Leu Ser Phe Asn Phe Lys Glu Ala 
Phe 

30 35 40 



TCT AGG AGG GTC TTC TCT GGA AAG TCA TCT CAT GAA TCT GAC TCC 
TCA 194 

Ser Arg Arg Val Phe Ser Gly Lys Ser Ser His Glu Ser Asp Ser 
Ser 

45 50 55 



AAT GTA ATG GTC ACT GCT TCT AAA AGA GTC CTT CCT GAT GGT CGG 
ATT 242 

Asn Val Met Val Thr Ala Ser Lys Arg Val Leu Pro Asp Gly Arg 
He 

60 65 70 



GAA TGC TAT TCT TCT TCA ACA GAT CAA TTG GAA GCC CCT GGC ACA 
GTT 290 

Glu Cys Tyr Ser Ser Ser Thr Asp Gin Leu Glu Ala Pro Gly Thr 
Val 

75 80 85 

90 

TCA GAA GAA TCC CAG GTG CTT ACT GAT GTT GAG AGT CTC ATT ATG 
GAT 338 

Ser Glu Glu Ser Gin Val Leu Thr Asp Val Glu Ser Leu He Met 
Asp 

95 100 105 



GAT AAG ATT GTT GAA GAT GAA GTA AAT AAA GAA TCT GTT CCA ATG 



CGG 386 
Asp Lys lie Val Glu 
Arg 

110 



GAG ACA GTT AGC ATC 
CCT 434 
Glu Thr Val Ser lie 
Pro 

125 



CCA CCC GGC AGA GGG 
ACA 482 
Pro Pro Gly Arg Gly 
Thr 

140 



GGC TTT CGT CAA CAC 

CTC 53 0 

Gly Phe Arg Gin His 

Leu 

155 

170 

CGA GAA GAA ATT GAC 

CGT 578 

Arg Glu Glu lie Asp 

Arg 

175 



GGC TAT GAA AAG TTT 

TAT 62 6 

Gly Tyr Glu Lys Phe 

Tyr 

190 



AGA GAG TGG GCA CCA 

TTC 674 

Arg Glu Trp Ala Pro 

Phe 

205 

AAT AAC TGG AAT CCT 

GGT 722 

Asn Asn Trp Asn Pro 

Gly 

220 
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Asp Glu Val Asn Lys 
115 

AGA AAA ATT GGA TCT 
Arg Lys lie Gly Ser 
130 

CAA AGA ATA TAT GAC 
Gin Arg lie Tyr Asp 
145 

CTA GAT TAC CGG TAT 
Leu Asp Tyr Arg Tyr 
160 

AAG TAT GAA GGT AGT 
Lys Tyr Glu Gly Ser 

180 

GGT TTC TCA CGC AGT 
Gly Phe Ser Arg Ser 
195 

GGA GCT ACG TGG GCT 
Gly Ala Thr Trp Ala 
210 

AAT GCA GAT GTC ATG 
Asn Ala Asp Val Met 
225 



Glu Ser Val Pro Met 
120 

AAA CCA AGG TCC ATT 
Lys Pro Arg Ser lie 
135 

ATA GAT CCA AGC TTG 
lie Asp Pro Ser Leu 
150 

TCA CAG TAC AAA AGA 
Ser Gin Tyr Lys Arg 
165 

CTG GAT GCA TTT TCT 
Leu Asp Ala Phe Ser 

185 

GAA ACA GGA ATA ACT 
Glu Thr Gly He Thr 
200 

GCA TTG ATT GGA GAT 
Ala Leu He Gly Asp 
215 

ACT CAG AAT GAG TGT 
Thr Gin Asn Glu Cys 
230 
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GTC TGG GAG ATC TTT 

ATT 770 

Val Trp Glu He Phe 

He 

235 

250 

CCC CAT GGT TCT CGA 

AAC 818 

Pro His Gly Ser Arg 

Asn 

255 



AAA GAT TCT ATT CCT 

GGT 866 

Lys Asp Ser He Pro 

Gly 

270 



GAA CTC CCA TAT AAT 

AAG 914 

Glu Leu Pro Tyr Asn 

Lys 

285 



TAT GTG TTC AAA AAT 

ATT 962 

Tyr Val Phe Lys Asn 

He 

300 



TAT GAG TCG CAC GTT 

ACA 1010 

Tyr Glu Ser His Val 

Thr 

315 

330 

TAT GCC AAC TTT AGA 

GGC 1058 

Tyr Ala Asn Phe Arg 

Gly 

335 



TAC AAT GCT GTT CAG 



TTG CCG AAT AAT GCA 
Leu Pro Asn Asn Ala 
240 

GTA AAG ATA CGC ATG 
Val Lys He Arg Met 

260 

GCT TGG ATC AAG TTC 
Ala Trp He Lys Phe 
275 

GGC ATA TAC TAT GAT 
Gly He Tyr Tyr Asp 
290 

CCT CAG CCA AAG AGA 
Pro Gin Pro Lys Arg 
305 

GGA ATG AGT AGT ACG 
Gly Met Ser Ser Thr 
320 

GAT GAT GTG CTT CCT 
Asp Asp Val Leu Pro 

340 

CTC ATG GCT ATT CAA 



GAT GGT TCA CCA CCA 
Asp Gly Ser Pro Pro 
245 

GAT ACT CCA TCT GGC 
Asp Thr Pro Ser Gly 

265 

TCA GTT CAA GCA CCA 
Ser Val Gin Ala Pro 
280 

CCT CCC GAG GAG GAG 
Pro Pro Glu Glu Glu 
295 

CCA AAA TCA CTT CGG 
Pro Lys Ser Leu Arg 
310 

GAG CCA GTA ATT AAC 
Glu Pro Val He Asn 
325 

CGC ATC AAA AAG CTT 
Arg He Lys Lys Leu 

345 

GAG CAT TCA TAT TAT 



GCT 1106 
Tyr Asn Ala Val Gin 
Ala 

350 



AGT TTT GGG TAT CAC 

TTT 1154 

Ser Phe Gly Tyr His 

Phe 

365 



GGA ACT CCT GAT GAT 

TTA 1202 

Gly Thr Pro Asp Asp 

Leu 

380 



GGT CTT CTT GTT CTC 

AAT 1250 

Gly Leu Leu Val Leu 

Asn 

395 

410 

ACG TTG GAT GGG CTG 

TTT 1298 

Thr Leu Asp Gly Leu 

Phe 

415 



CAC TCT GGA CCA CGG 

TTC 1346 

His Ser Gly Pro Arg 

Phe 

430 



AAC TAT GGG AGC TGG 

AGG 1394 

Asn Tyr Gly Ser Trp 

Arg 

445 



TGG TGG TTG GAT GAG 

GTG 1442 

Trp Trp Leu Asp Glu 

Val 
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Leu Met Ala He Gin 
355 

GTC ACA AAC TTT TAT 
Val Thr Asn Phe Tyr 
370 

TTA AAG TCT CTA ATA 
Leu Lys Ser Leu He 
385 

ATG GAT ATT GTT CAT 
Met Asp He Val His 
400 

AAT ATG TTT GAT GGT 
Asn Met Phe Asp Gly 

420 

GGT CAT CAT TGG ATG 
Gly His His Trp Met 
435 

GAG GTT CTA AGG TTT 
Glu Val Leu Arg Phe 
450 

TAC AAG TTT GAT GGG 
Tyr Lys Phe Asp Gly 



Glu His Ser Tyr Tyr 
360 

GCA GCT AGC AGC CGA 
Ala Ala Ser Ser Arg 
375 

GAT AAA GCT CAC GAG 
Asp Lys Ala His Glu 
390 

AGC CAT GCA TCA ACT 
Ser His Ala Ser Thr 
405 

ACG GAT GGT CAC TAC 
Thr Asp Gly His Tyr 

425 

TGG GAC TCT CGC CTT 
Trp Asp Ser Arg Leu 
440 

CTT CTT TCA AAT GCA 
Leu Leu Ser Asn Ala 
455 

TTC AGA TTT GAT GGG 
Phe Arg Phe Asp Gly 



460 



ACT TCA ATG ATG TAC 

GGC 1490 

Thr Ser Met Met Tyr 

Gly 

475 

490 

AAC TAC AAT GAA TAC 

GTT 1538 

Asn Tyr Asn Glu Tyr 

Val 

495 



TAT TTG ATG CTG TTG 

GCT 1586 

Tyr Leu Met Leu Leu 

Ala 

510 



GTC ACC ATT GGT GAA 

CCG 1634 

Val Thr lie Gly Glu 

Pro 

525 



GTT GAA GAT GGT GGT 

GTT 1682 

Val Glu Asp Gly Gly 

Val 

540 



GCT GAT AAA TGG GTT 

AAA 1730 

Ala Asp Lys Trp Val 

Lys 

555 

570 

ATG GGT GAC ATT GTA 

AAG 1778 

Met Gly Asp He Val 

Lys 

575 
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465 

ACC CAT CAT GGA TTG 
Thr His His Gly Leu 
480 

TTT GGA TAT GCA ACT 
Phe Gly Tyr Ala Thr 

500 

AAT GAT ATG ATT CAT 
Asn Asp Met He His 
515 

GAT GTT AGT GGA ATG 
Asp Val Ser Gly Met 
530 

GTT GGC TTT GAT TAT 
Val Gly Phe Asp Tyr 
545 

GAG ATT ATT CAG AAG 
Glu He He Gin Lys 
560 

CAT ATG CTG ACC AAC 
His Met Leu Thr Asn 

580 



470 

CAG GTA GAT TTT ACC 
Gin Val Asp Phe Thr 
485 

GAT GTA GAT GCT GTG 
Asp Val Asp Ala Val 

505 

GGT CTC TTC CCA GAG 
Gly Leu Phe Pro Glu 
520 

CCA ACA GTT TGC ATT 
Pro Thr Val Cys He 
535 

CGT CTC CAC ATG GCT 
Arg Leu His Met Ala 
550 

AGA GAT GAA GAT TGG 
Arg Asp Glu Asp Trp 
565 

AGG CGG TGG TTG GAA 
Arg Arg Trp Leu Glu 

585 



TGT GTT TCT TAT GCT 
AAA 1826 
Cys Val Ser Tyr Ala 
Lys 

590 



ACT ATT GCA TTT TGG 
GCT 1874 
Thr lie Ala Phe Trp 
Ala 

605 



CTT GAC AGA CCA TCT 

CAC 1922 

Leu Asp Arg Pro Ser 

His 

620 



AAA ATG ATC AGG CTT 

TTG 1970 

Lys Met lie Arg Leu 

Leu 

635 

650 

AAT TTT ATG GGA AAT 

CCA 2018 

Asn Phe Met Gly Asn 

Pro 

655 



AGA GGT GAT CTA CAT 

AAT 2066 

Arg Gly Asp Leu His 

Asn 

670 



TAC AGT TAT GAT AAA 

AAG 2114 

Tyr Ser Tyr Asp Lys 

Lys 

685 



CAT CTG AGA TAT CAT 

CAT 2162 

His Leu Arg Tyr His 
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GAA AGT CAT GAC CAG 
Glu Ser His Asp Gin 
595 

CTG ATG GAC AAG GAT 
Leu Met Asp Lys Asp 
610 

ACT CCT CTC ATA GAT 
Thr Pro Leu lie Asp 
625 

ATT ACC ATG GGA TTA 
lie Thr Met Gly Leu 
640 

GAA TTT GGA CAC CCC 
Glu Phe Gly His Pro 

660 

CTT CCC AGT GGT AAA 
Leu Pro Ser Gly Lys 
675 

TGC CGG CGT AGG TTT 
Cys Arg Arg Arg Phe 
690 

GGA ATG CAA GAG TTT 
Gly Met Gin Glu Phe 



GCC CTT GTT GGT GAC 
Ala Leu Val Gly Asp 
600 

ATG TAT GAC TTC ATG 
Met Tyr Asp Phe Met 
615 

CGT GGA GTA GCA TTG 
Arg Gly Val Ala Leu 
630 

GGC GGA GAA GGA TAT 
Gly Gly Glu Gly Tyr 
645 

GAG TGG ATT GAT TTT 
Glu Trp lie Asp Phe 

665 

TTT GTT CCT GGG AAC 
Phe Val Pro Gly Asn 
680 

GAT CTA GGC AAT TCA 
Asp Leu Gly Asn Ser 
695 

GAT CAA GCA ATT CAG 
Asp Gin Ala lie Gin 



His 

700 



CTT GAA GAA GCC TAT 

TCA 2210 

Leu Glu Glu Ala Tyr 

Ser 

715 

730 

CGG AAG GAT GAA AGG 

CTC 2258 

Arg Lys Asp Glu Arg 

Leu 

735 



GTT TTT GTA TTC AAT 

CGA 2306 

Val Phe Val Phe Asn 

Arg 

750 



GTT GGC TGC TTA AAG 

GAT 23 54 

Val Gly Cys Leu Lys 

Asp 

765 



GAT CCT TTG TTT GGA 

CAC 2402 

Asp Pro Leu Phe Gly 

His 

780 



TTC AGC TTT GAA GGG 

GTG 2450 

Phe Ser Phe Glu Gly 

Val 

795 

810 

TAC ACA CCA TGT AGA 

GAA 2498 

Tyr Thr Pro Cys Arg 

Glu 

815 
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705 

GGT TTC ATG ACT TCT 
Gly Phe Met Thr Ser 
720 

GAT CGG ATC ATT GTC 
Asp Arg lie lie Val 

740 

TTT CAT TGG ACT AGC 
Phe His Trp Thr Ser 
755 

CCA GGA AAG TAC AAG 
Pro Gly Lys Tyr Lys 
770 

GGC TTT GGC AGG CTT 
Gly Phe Gly Arg Leu 
785 

TGG TAC GAT AAC CGG 
Trp Tyr Asp Asn Arg 
800 

ACA GCA GTG GTC TAT 
Thr Ala Val Val Tyr 

820 



710 

GAG CAC CAA TAC ATA 
Glu His Gin Tyr lie 
725 

TTC GAG AGG GGA AAC 
Phe Glu Arg Gly Asn 

745 

AGC TAT TCG GAT TAC 
Ser Tyr Ser Asp Tyr 
760 

ATA GTC TTG GAT TCA 
lie Val Leu Asp Ser 
775 

AGT CAT GAT GCA GAG 
Ser His Asp Ala Glu 
790 

CCT CGA TCC TTC ATG 
Pro Arg Ser Phe Met 
805 

GCT TTA GTG GAG GAT 
Ala Leu Val Glu Asp 

825 
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GTG GAG AAT GAA TTG GAA CCT GTC GCC GGT TAA GATATATCTT 
AACAACAGGT 2551 

Val Glu Asn Glu Leu Glu Pro Val Ala Gly * 
830 835 



TC TGAAGC AG GAATGCCATT ATTGATCTTC CTATGTT 
2588 



(2) INFORMATION FOR SEQ ID NO : 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 837 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 29: 

Met Gly His Tyr Thr lie Ser Gly lie Arg Phe Pro Cys Ala Pro 
Leu 

15 10 15 



Cys Lys Ser Gin Ser Thr Gly Phe His Gly Tyr Arg Arg Thr Ser 
Ser 

20 25 30 



Cys Leu Ser Phe Asn Phe Lys Glu Ala Phe Ser Arg Arg Val Phe 
Ser 

35 40 45 



Gly Lys Ser Ser His Glu Ser Asp Ser Ser Asn Val Met Val Thr 
Ala 

50 55 60 



Ser Lys Arg Val Leu Pro Asp Gly Arg lie Glu Cys Tyr Ser Ser 
Ser 

65 70 75 

80 

Thr Asp Gin Leu Glu Ala Pro Gly Thr Val Ser Glu Glu Ser Gin 
Val 

85 90 95 
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Leu Thr Asp Val Glu Ser Leu lie Met Asp Asp Lys lie Val Glu 
Asp 

100 105 110 



Glu Val Asn Lys Glu Ser Val Pro Met Arg Glu Thr Val Ser lie 
Arg 

115 120 125 



Lys lie Gly Ser Lys Pro Arg Ser lie Pro Pro Pro Gly Arg Gly 
Gin 

130 135 140 



Arg lie Tyr Asp lie Asp Pro Ser Leu Thr Gly Phe Arg Gin His 
Leu 

145 150 155 

160 

Asp Tyr Arg Tyr Ser Gin Tyr Lys Arg Leu Arg Glu Glu He Asp 
Lys 

165 170 175 



Tyr Glu Gly Ser Leu Asp Ala Phe Ser Arg Gly Tyr Glu Lys Phe 
Gly 

180 185 190 



Phe Ser Arg Ser Glu Thr Gly He Thr Tyr Arg Glu Trp Ala Pro 
Gly 

195 200 205 



Ala Thr Trp Ala Ala Leu He Gly Asp Phe Asn Asn Trp Asn Pro 
Asn 

210 215 220 



Ala Asp Val Met Thr Gin Asn Glu Cys Gly Val Trp Glu He Phe 
Leu 

225 230 235 

240 

Pro Asn Asn Ala Asp Gly Ser Pro Pro He Pro His Gly Ser Arg 
Val 

245 250 255 



Lys He Arg Met Asp Thr Pro Ser Gly Asn Lys Asp Ser He Pro 
Ala 
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260 



265 



270 



Trp lie Lys Phe Ser Val Gin Ala Pro Gly Glu Leu Pro Tyr Asn 
Gly 

275 280 285 

He Tyr Tyr Asp Pro Pro Glu Glu Glu Lys Tyr Val Phe Lys Asn 
Pro 

290 295 300 



Gin Pro Lys Arg Pro Lys Ser Leu Arg He Tyr Glu Ser His Val 
Gly 

305 310 315 

320 

Met Ser Ser Thr Glu Pro Val He Asn Thr Tyr Ala Asn Phe Arg 
Asp 

325 330 335 



Asp Val Leu Pro Arg He Lys Lys Leu Gly Tyr Asn Ala Val Gin 
Leu 

340 345 350 



Met Ala He Gin Glu His Ser Tyr Tyr Ala Ser Phe Gly Tyr His 
Val 

355 360 365 



Thr Asn Phe Tyr Ala Ala Ser Ser Arg Phe Gly Thr Pro Asp Asp 
Leu 

370 375 380 



Lys Ser Leu He Asp Lys Ala His Glu Leu Gly Leu Leu Val Leu 
Met 

385 390 395 

400 

Asp He Val His Ser His Ala Ser Thr Asn Thr Leu Asp Gly Leu 
Asn 

405 410 415 



Met Phe Asp Gly Thr Asp Gly His Tyr Phe His Ser Gly Pro Arg 
Gly 

420 425 430 
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His His Trp Met Trp Asp Ser Arg Leu Phe Asn Tyr Gly Ser Trp 
Glu 

435 440 445 



Val Leu Arg Phe Leu Leu Ser Asn Ala Arg Trp Trp Leu Asp Glu 
Tyr 

450 455 460 



Lys Phe Asp Gly Phe Arg Phe Asp Gly Val Thr Ser Met Met Tyr 
Thr 

465 470 475 

480 

His His Gly Leu Gin Val Asp Phe Thr Gly Asn Tyr Asn Glu Tyr 
Phe 

485 490 495 



Gly Tyr Ala Thr Asp Val Asp Ala Val Val Tyr Leu Met Leu Leu 
Asn 

500 505 510 



Asp Met lie His Gly Leu Phe Pro Glu Ala Val Thr lie Gly Glu 
Asp 

515 520 525 



Val Ser Gly Met Pro Thr Val Cys lie Pro Val Glu Asp Gly Gly 
Val 

530 535 540 



Gly Phe Asp Tyr Arg Leu His Met Ala Val Ala Asp Lys Trp Val 
Glu 

545 550 555 

560 

lie lie Gin Lys Arg Asp Glu Asp Trp Lys Met Gly Asp lie Val 
His 

565 570 575 



Met Leu Thr Asn Arg Arg Trp Leu Glu Lys Cys Val Ser Tyr Ala 
Glu 

580 585 590 



Ser His Asp Gin Ala Leu Val Gly Asp Lys Thr lie Ala Phe Trp 
Leu 



595 
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600 



605 



Met Asp Lys Asp Met Tyr Asp Phe Met Ala Leu Asp Arg Pro Ser 
Thr 

610 615 620 



Pro Leu lie Asp Arg Gly Val Ala Leu His Lys Met He Arg Leu 
He 

625 630 635 

640 

Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu Asn Phe Met Gly Asn 
Glu 

645 650 655 



Phe Gly His Pro Glu Trp He Asp Phe Pro Arg Gly Asp Leu His 
Leu 

660 665 670 



Pro Ser Gly Lys Phe Val Pro Gly Asn Asn Tyr Ser Tyr Asp Lys 
Cys 

675 680 685 

Arg Arg Arg Phe Asp Leu Gly Asn Ser Lys His Leu Arg Tyr His 
Gly 

690 695 700 



Met Gin Glu Phe Asp Gin Ala He Gin His Leu Glu Glu Ala Tvr 
Gly 

705 710 715 

720 

Phe Met Thr Ser Glu His Gin Tyr He Ser Arg Lys Asp Glu Arg 
Asp 

725 730 735 



Arg He He Val Phe Glu Arg Gly Asn Leu Val Phe Val Phe Asn 
Phe 

740 745 750 



His Trp Thr Ser Ser Tyr Ser Asp Tyr Arg Val Gly Cys Leu Lys 
Pro 

755 760 765 
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Gly Lys Tyr Lys lie Val Leu Asp Ser Asp Asp Pro Leu Phe Gly 
Gly 

770 775 780 



Phe Gly Arg Leu Ser His Asp Ala Glu His Phe Ser Phe Glu Gly 
Trp 

785 790 795 

800 

Tyr Asp Asn Arg Pro Arg Ser Phe Met Val Tyr Thr Pro Cys Arg 
Thr 

805 810 815 



Ala Val Val Tyr Ala Leu Val Glu Asp Glu Val Glu Asn Glu Leu 
Glu 

820 825 830 



Pro Val Ala Gly 
835 



(2) INFORMATION FOR SEQ ID NO : 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 805 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 131. .2677 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 30: 

AGTGAATTCG AGCTCGGTAC CCGGGGATCC GATTCGCATT TCTCGCTATT 
GCTTTCCGTT 60 

TATTTCCATA TATAAAATAT CAAATCTAAT CACTTGCGCC ATTTCTATCT 
CTCTCCAAAC 12 0 

TCTCACCGAA ATG GTA TAC TAC ACT GTA TCA GGC ATA CGT TTT CCT 
TGT 169 

Met Val Tyr Tyr Thr Val Ser Gly lie Arg Phe Pro 

Cys 

840 845 

850 



GCA CCT TCA CTC TAC AAA TCT CAG CTC ACC AGC TTC CAT GGC GGT 



CGA 217 
Ala Pro Ser Leu Tyr 
Arg 

855 



AGG ACC TCT TCT GGC 

CCT 265 

Arg Thr Ser Ser Gly 

Pro 

870 



CGG AAG ATC TTT GCT 

AAT 313 

Arg Lys lie Phe Ala 

Asn 

885 



TTA ACT GTC TCT GCA 

ATT 3 61 

Leu Thr Val Ser Ala 

He 

900 



GAT GGC TCT TCT TCT 

GTT 409 

Asp Gly Ser Ser Ser 

Val 

915 

930 

TTG GAG GAA TCC CAG 

GAA 457 

Leu Glu Glu Ser Gin 

Glu 

935 



GAT GAT AAG AAT GTT 

CCA 505 

Asp Asp Lys Asn Val 

Pro 

950 



TTG CAT GAG ACA ATT 

TCC 553 

Leu His Glu Thr He 

Ser 
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Lys Ser Gin Leu Thr 

860 

CTT TCC TTC CTC TTG 
Leu Ser Phe Leu Leu 
875 

GGA AAG TCC TCT TAT 
Gly Lys Ser Ser Tyr 
890 

TCT GAG AAG GTC CTT 
Ser Glu Lys Val Leu 
905 

TCA ACA TAT CAA TTA 
Ser Thr Tyr Gin Leu 
920 

GTT CTT GGT GAT GCA 
Val Leu Gly Asp Ala 

940 

GAG GAG GAT GAA GTA 
Glu Glu Asp Glu Val 
955 

AGC ATT GGA AAA AGT 
Ser He Gly Lys Ser 



Ser Phe His Gly Gly 

865 

AAG AAG GAG CTG TTT 
Lys Lys Glu Leu Phe 
880 

GAA TCT GAC TCC TCA 
Glu Ser Asp Ser Ser 
895 

GTT CCT GAT GAT CAG 
Val Pro Asp Asp Gin 
910 

GAA ACC ACT GGC ACA 
Glu Thr Thr Gly Thr 
925 

GAG AGT CTT GTG ATG 
Glu Ser Leu Val Met 

945 

AAA AAA GAG TCG GTT 
Lys Lys Glu Ser Val 
960 

GAA TCT AAA CCA AGG 
Glu Ser Lys Pro Arg 
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965 970 975 



ATT CCT CCA CCT GGC AGT GGG CAG AGA ATA TAT GAC ATA GAT CCA 
AGC 601 

He Pro Pro Pro Gly Ser Gly Gin Arg He Tyr Asp He Asp Pro 
Ser 

980 985 990 



TTG GCA GGT TTC CGT CAG CAT CTT GAC TAC CGA TAT TCA CAG TAC 
AAA 649 

Leu Ala Gly Phe Arg Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr 
Lys 

995 1000 1005 

1010 

AGG CTG CGT GAG GAA ATT GAC AAG TAT GAA GGT GGT TTG GAT GCA 
TTC 697 

Arg Leu Arg Glu Glu He Asp Lys Tyr Glu Gly Gly Leu Asp Ala 
Phe 

1015 1020 1025 



TCT CGT GGA TTT GAA AAG TTT GGT TTC TTA CGC AGT GAA ACA GGA 
ATA 745 

Ser Arg Gly Phe Glu Lys Phe Gly Phe Leu Arg Ser Glu Thr Gly 
He 

1030 1035 1040 



ACT TAT AGG GAA TGG GCA CCT GGA GCT ACG TGG GCT GCA CTT ATT 
GGA 793 

Thr Tyr Arg Glu Trp Ala Pro Gly Ala Thr Trp Ala Ala Leu He 
Gly 

1045 1050 1055 



GAT TTC AAC AAT TGG AAT CCT AAT GCA GAT GTC ATG ACT CGG AAT 
GAG 841 

Asp Phe Asn Asn Trp Asn Pro Asn Ala Asp Val Met Thr Arg Asn 
Glu 

1060 1065 1070 



TTT GGT GTC TGG GAG ATT TTT TTG CCA AAT AAC GCA GAT GGT TCA 
CCA 889 

Phe Gly Val Trp Glu He Phe Leu Pro Asn Asn Ala Asp Gly Ser 
Pro 

1075 1080 1085 

1090 
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CCA ATT CCT CAT GGT TCT CGA GTA AAG ATA CGC ATG GAT ACT CCA 
TCT 937 

Pro He Pro His Gly Ser Arg Val Lys He Arg Met Asp Thr Pro 
Ser 

1095 1100 H05 

GGC ATC AAA GAT TCA ATT CCT GCT TGG ATC AAG TTC TCA GTT CAG 
GCA 985 

Gly He Lys Asp Ser He Pro Ala Trp He Lys Phe Ser Val Gin 
Ala 

1110 1115 H20 



CCT GGT GAA ATC CCA TAC AAT GCC ATA TAC TAT GAT CCA CCA AAG 
GAG 1033 

Pro Gly Glu He Pro Tyr Asn Ala He Tyr Tyr Asp Pro Pro Lys 
Glu 

1125 1130 1135 



GAG AAG TAT GTG TTC AAA CAT CCT CAG CCA AAG AGA CCA AAA TCA 
CTT 1081 

Glu Lys Tyr Val Phe Lys His Pro Gin Pro Lys Arg Pro Lys Ser 
Leu 

1140 H45 1150 



AGG ATT TAT GAA TCT CAT GTT GGG ATG AGT AGT ATG GAG CCA ATA 
ATT 1129 

Arg He Tyr Glu Ser His Val Gly Met Ser Ser Met Glu Pro He 
He 

1155 H60 H65 

1170 

AAC ACA TAT GCC AAC TTT AGA GAT GAT ATG CTT CCT CGC ATC AAA 
AAG 1177 

Asn Thr Tyr Ala Asn Phe Arg Asp Asp Met Leu Pro Arg lie Lys 
Lys 

1175 H80 1185 



CTT GGC TAC AAT GCT GTT CAG ATC ATG GCT ATT CAA GAG CAT TCC 
TAT 1225 

Leu Gly Tyr Asn Ala Val Gin He Met Ala He Gin Glu His Ser 
Tyr 

1190 1195 1200 



TAT GCT AGT TTT GGG TAC CAT GTC ACA AAC TTT TTT GCA CCT AGC 
AGC 1273 

Tyr Ala Ser Phe Gly Tyr His Val Thr Asn Phe Phe Ala Pro Ser 
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Ser 

1205 1210 1215 



CGA TTT GGA ACT CCT GAT GAT TTG AAG TCT TTA ATA GAT AAA GCT 
CAT 1321 

Arg Phe Gly Thr Pro Asp Asp Leu Lys Ser Leu He Asp Lys Ala 
His 

1220 1225 1230 

GAG TTA GGG CTG CTT GTT CTC ATG GAT ATT GTT CAT AGC CAT GCG 
TCA 1369 

Glu Leu Gly Leu Leu Val Leu Met Asp He Val His Ser His Ala 
Ser 

1235 1240 1245 

1250 

AAT AAT ACG TTG GAT GGG CTG AAC ATG TTT GAT GGT ACG GAT AGT 
CAC 1417 

Asn Asn Thr Leu Asp Gly Leu Asn Met Phe Asp Gly Thr Asp Ser 
His 

1255 1260 1265 



TAC TTC CAC TCC GGA TCA CGG GGT CAT CAT TGG TTG TGG GAC TCT 
CGC 1465 

Tyr Phe His Ser Gly Ser Arg Gly His His Trp Leu Trp Asp Ser 
Arg 

1270 1275 1280 



CTT TTC AAC TAT GGA AGC TGG GAG GTG CTA AGA TTT CTT CTT TCA 
AAT 1513 

Leu Phe Asn Tyr Gly Ser Trp Glu Val Leu Arg Phe Leu Leu Ser 
Asn 

1285 1290 1295 



GCA AGA TGG TGG TTG GAA GAG TAC AGG TTT GAT GGT TTT AGA TTT 
GAT 1561 

Ala Arg Trp Trp Leu Glu Glu Tyr Arg Phe Asp Gly Phe Arg Phe 
Asp 

1300 1305 1310 



GGG GTG ACT TCC ATG ATG TAC ACT CCC CAT GGG TTG CAG GTA GCT 
TTT 1609 

Gly Val Thr Ser Met Met Tyr Thr Pro His Gly Leu Gin Val Ala 
Phe 

1315 1320 1325 

1330 
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ACT GGC AAC TAC AAT GAG TAC TTT GGA TAT GCA ACT GAT GTA GAT 
GCT 1657 

Thr Gly Asn Tyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp 
Ala 

1335 1340 1345 



GTG ATT TAT TTG ATG CTT GTG AAT GAT ATG ATT CAC GGT CTT TTC 
CCT 1705 

Val lie Tyr Leu Met Leu Val Asn Asp Met lie His Gly Leu Phe 
Pro 

1350 1355 1360 



GAG GCT GTT ACC ATT GGT GAA GAT GTT AGC GGA AAG CCA ACA TTT 
TGC 1753 

Glu Ala Val Thr He Gly Glu Asp Val Ser Gly Lys Pro Thr Phe 
Cys 

1365 1370 1375 



ATT CCA GTG GAA GAT GGT GGT GTT GGA TTT GAT TAC CGT CTC CAC 
ATG 1801 

He Pro Val Glu Asp Gly Gly Val Gly Phe Asp Tyr Arg Leu His 
Met 

1380 1385 1390 



GCC ATT GCC GAT AAA TGG ATT GAG ATT CTT AAG AAG AGA GAT GAG 
GAC 1849 

Ala He Ala Asp Lys Trp He Glu He Leu Lys Lys Arg Asp Glu 
Asp 

1395 1400 1405 

1410 

TGG AAA ATG GGT GAC ATT GTG CAT ACA CTC ACC AAC AGA AGG TGG 
TTG 1897 

Trp Lys Met Gly Asp He Val His Thr Leu Thr Asn Arg Arg Trp 
Leu 

1415 1420 1425 



GAA AAA TGT GTT GCT TAT GCT GAA AGT CAT GAC CAA GCT CTT GTT 
GGT 1945 

Glu Lys Cys Val Ala Tyr Ala Glu Ser His Asp Gin Ala Leu Val 
Gly 

1430 1435 1440 



GAC AAA ACT ATT GCA TTT TGG CTG ATG GAC AAG GAC ATG TAC GAC 
TTC 1993 

Asp Lys Thr He Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp 
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Phe 

1445 1450 1455 

ATG GCT CGT GAC AGA CCA TCT ACT CCT CTT ATA GAT CGT GGA ATA 
GCA 2041 

Met Ala Arg Asp Arg Pro Ser Thr Pro Leu He Asp Arg Gly He 
Ala 

1460 1465 1470 

TTG CAC AAA ATG ATC AGG CTT ATT ACC ATG GGC TTA GGC GGA GAA 
GGA 2089 

Leu His Lys Met He Arg Leu He Thr Met Gly Leu Gly Gly Glu 
Gly 

1475 1480 1485 

1490 

TAT TTG AAT TTT ATG GGA AAT GAA TTT GGA CAT CCT GAG TGG ATT 
GAT 2137 

Tyr Leu Asn Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp He 
Asp 



1495 1500 



1505 



TTT CCA AGA GGG GAT CGA CAT CTG CCC AAT GGT AAA GTA ATT CCA 
GGG 2185 

Phe Pro Arg Gly Asp Arg His Leu Pro Asn Gly Lys Val He Pro 
Gly 

1510 1515 152Q 

AAC AAC CAC AGT TAT GAT AAA TGC CGT CGT AGA TTT GAT CTA GGT 
GAT 2233 

Asn Asn His Ser Tyr Asp Lys Cys Arg Arg Arg Phe Asp Leu Gly 
Asp J 

1525 1530 1535 



GCA GAC TAT CTA AGA TAT CAT GGA ATG CAA GAG TTT GAT CAG GCA 
ATG 2281 

Ala Asp Tyr Leu Arg Tyr His Gly Met Gin Glu Phe Asp Gin Ala 
Met 



1540 1545 



1550 



CAA CAT CTT GAA GAA GCC TAT GGT TTC ATG ACT TCT GAG CAC CAG 
TAT 2329 

Gin His Leu Glu Glu Ala Tyr Gly Phe Met Thr Ser Glu His Gin 
Tyr 

1555 i5 6 o 1565 

1570 
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ATA TCA CGG AAG GAT GAA GGA GAT CGG ATC ATT GTC TTT GAG AGG 
GGA 2377 

He Ser Arg Lys Asp Glu Gly Asp Arg He He Val Phe Glu Ara 
Gly 

1575 1580 1585 

AAC CTT GTT TTT GTA TTC AAC TTT CAT TGG ACT AAC AGC TAT TCA 
GAT 2425 

Asn Leu Val Phe Val Phe Asn Phe His Trp Thr Asn Ser Tyr Ser 

Asp 

1590 1595 1600 



TAC CGA GTT GGC TGC TTC AAG TCA GGA AAG TAC AAG ATT GTT TTG 
GAC 2473 

Tyr Arg Val Gly Cys Phe Lys Ser Gly Lys Tyr Lys He Val Leu 
Asp 

1605 1610 1615 



TCG GAT GAT GGC TTG TTT GGA GGC TTC AAC AGG CTT AGT CAT GAT 
GCC 2521 

Ser Asp Asp Gly Leu Phe Gly Gly Phe Asn Arg Leu Ser His Asp 
Ala 

1620 1625 1630 



GAG CAC TTC ACC TTT GAC GGG TGG TAT GAT AAC CGG CCT CGG TCC 
TTC 2569 

Glu His Phe Thr Phe Asp Gly Trp Tyr Asp Asn Arg Pro Arq Ser 
Phe 

1635 1640 1645 

1650 

ATG GTA TAT GCA CCA TCT AGG ACA GCA GTG GTC TAT GCT TTA GTA 
GAA 2617 

Met Val Tyr Ala Pro Ser Arg Thr Ala Val Val Tyr Ala Leu Val 
Glu 

1655 1660 1665 



GAT GAA GAG AAT GAA GCA GAG AAT GAA GTA GAA AGT GAA GTG AAA 
CCA 2665 

Asp Glu Glu Asn Glu Ala Glu Asn Glu Val Glu Ser Glu Val Lys 
Pro 

1670 1675 1680 



GCC TCC GGC TGA GATAGATATT TAGTAAGAGG ATCCCCTAAA GCAGGAATGG 
2717 



Ala Ser Gly * 
1685 
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TTAACCTGTG CATCTGCATT GAACGACGTA TATTGAGACT GGAAATCCAT 
ATGACTAGTA 2777 

GATCCTCTAG AGTCGACCTG CAGGCATG 
2805 



(2) INFORMATION FOR SEQ ID NO : 31: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 849 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 



Met Val Tyr Tyr Thr Val Ser Gly He Arg Phe Pro Cys Ala Pro 
Ser 

15 10 15 



Leu Tyr Lys Ser Gin Leu Thr Ser Phe His Gly Gly Arg Arg Thr 
Ser 

20 25 30 

Ser Gly Leu Ser Phe Leu Leu Lys Lys Glu Leu Phe Pro Arg Lys 
He 

35 40 45 



Phe Ala Gly Lys Ser Ser Tyr Glu Ser Asp Ser Ser Asn Leu Thr 
Val 

50 55 60 



Ser Ala Ser Glu Lys Val Leu Val Pro Asp Asp Gin He Asp Gly 
Ser 

65 70 75 

80 

Ser Ser Ser Thr Tyr Gin Leu Glu Thr Thr Gly Thr Val Leu Glu 
Glu 

85 go 95 

Ser Gin Val Leu Gly Asp Ala Glu Ser Leu Val Met Glu Asp Asp 
Lys 
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100 105 110 



Asn Val Glu Glu Asp Glu Val Lys Lys Glu Ser Val Pro Leu His 
Glu 

115 120 125 



Thr lie Ser lie Gly Lys Ser Glu Ser Lys Pro Arg Ser lie Pro 
Pro 

130 135 140 



Pro Gly Ser Gly Gin Arg lie Tyr Asp lie Asp Pro Ser Leu Ala 
Gly 

145 150 155 

160 

Phe Arg Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr Lys Arg Leu 
Arg 

165 170 175 



Glu Glu lie Asp Lys Tyr Glu Gly Gly Leu Asp Ala Phe Ser Arg 
Gly 

180 185 190 



Phe Glu Lys Phe Gly Phe Leu Arg Ser Glu Thr Gly lie Thr Tyr 
Arg 

195 200 205 



Glu Trp Ala Pro Gly Ala Thr Trp Ala Ala Leu lie Gly Asp Phe 
Asn 

210 215 220 



Asn Trp Asn Pro Asn Ala Asp Val Met Thr Arg Asn Glu Phe Gly 
Val 

225 230 235 

240 



Trp Glu lie Phe Leu Pro Asn Asn Ala Asp Gly Ser Pro Pro lie 
Pro 

245 250 255 



His Gly Ser Arg Val Lys lie Arg Met Asp Thr Pro Ser Gly lie 
Lys 

260 265 270 
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Asp Ser He Pro Ala Trp He Lys Phe Ser Val Gin Ala Pro Glv 
Glu 



275 



280 



285 



He Pro Tyr Asn Ala He Tyr Tyr Asp Pro Pro Lys Glu Glu Lys 
Tyr 

290 295 300 



Val Phe Lys His Pro Gin Pro Lys Arg Pro Lys Ser Leu Arg He 
Tyr 

305 310 
320 



315 



Glu Ser His Val Gly Met Ser Ser Met Glu Pro He He Asn Thr 
Tyr 

325 330 335 



Ala Asn Phe Arg Asp Asp Met Leu Pro Arg He Lys Lys Leu Gly 
Tyr 

340 345 350 



Asn Ala Val Gin He Met Ala He Gin Glu His Ser Tyr Tyr Ala 
Ser 



355 



360 



365 



Phe Gly Tyr His Val Thr Asn Phe Phe Ala Pro Ser Ser Arg Phe 
Gly 

370 375 380 



Thr Pro Asp Asp Leu Lys Ser Leu He Asp Lys Ala His Glu Leu 
Gly 

385 390 
400 



395 



Leu Leu Val Leu Met Asp He Val His Ser His Ala Ser Asn Asn 
Thr 



405 



410 



415 



Leu Asp Gly Leu Asn Met Phe Asp Gly Thr Asp Ser His Tyr Phe 
His 



420 



425 



430 
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Ser Gly Ser Arg Gly His His Trp Leu Trp Asp Ser Arg Leu Phe 
Asn 

435 440 445 



Tyr Gly Ser Trp Glu Val Leu Arg Phe Leu Leu Ser Asn Ala Arg 
Trp 

450 455 460 



Trp Leu Glu Glu Tyr Arg Phe Asp Gly Phe Arg Phe Asp Gly Val 
Thr 

465 470 475 

480 



Ser Met Met Tyr Thr Pro His Gly Leu Gin Val Ala Phe Thr Gly 
Asn 



485 



490 



495 



Tyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp Ala Val He 
Tyr 

500 505 510 



Leu Met Leu Val Asn Asp Met He His Gly Leu Phe Pro Glu Ala 
Val 



515 



520 



525 



Thr He Gly Glu Asp Val Ser Gly Lys Pro Thr Phe Cys He Pro 
Val 

530 535 540 



Glu Asp Gly Gly Val Gly Phe Asp Tyr Arg Leu His Met Ala He 
Ala 

545 550 555 

560 



Asp Lys Trp He Glu He Leu Lys Lys Arg Asp Glu Asp Trp Lys 
Met 



565 



570 



575 



Gly Asp He Val His Thr Leu Thr Asn Arg Arg Trp Leu Glu Lys 
Cys 

580 585 590 



Val Ala Tyr Ala Glu Ser His Asp Gin Ala Leu Val Gly Asp Lys 
Thr 



595 



56 
600 



605 



He Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp Phe Met Ala 
Arg 

610 615 620 



Asp Arg Pro Ser Thr Pro Leu He Asp Arg Gly He Ala Leu His 
Lys 

625 630 
640 



635 



Met He Arg Leu He Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu 
Asn 



645 



650 



655 



Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp He Asp Phe Pro 
Arg 

660 665 670 



Gly Asp Arg His Leu Pro Asn Gly Lys Val He Pro Gly Asn Asn 
His 



675 



680 



685 



Ser Tyr Asp Lys Cys Arg Arg Arg Phe Asp Leu Gly Asp Ala Asp 
Tyr 

690 695 700 



Leu Arg Tyr His Gly Met Gin Glu Phe Asp Gin Ala Met Gin His 
Leu 

705 710 
720 



715 



Glu Glu Ala Tyr Gly Phe Met Thr Ser Glu His Gin Tyr He Ser 
Arg 

725 730 735 



Lys Asp Glu Gly Asp Arg He He Val Phe Glu Arg Gly Asn Leu 
Val 



740 



745 



750 



Phe Val Phe Asn Phe His Trp Thr Asn Ser Tyr Ser Asp Tyr Ara 
Val a 

755 760 765 
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Gly Cys Phe Lys Ser Gly Lys Tyr Lys He Val Leu Asp Ser Asp 
Asp 

770 775 780 



Gly Leu Phe Gly Gly Phe Asn Arg Leu Ser His Asp Ala Glu His 
Phe 

785 790 795 

800 

Thr Phe Asp Gly Trp Tyr Asp Asn Arg Pro Arg Ser Phe Met Val 
Tyr 

805 810 815 



Ala Pro Ser Arg Thr Ala Val Val Tyr Ala Leu Val Glu Asp Glu 
Glu 

820 825 830 



Asn Glu Ala Glu Asn Glu Val Glu Ser Glu Val Lys Pro Ala Ser 
Gly 

835 840 845 



ABSTRACT 



58 



Title: Improveme nts in or Relating to Starch Content of Plants 

Disclosed is a nucleic acid sequence encoding a polypeptide having starch branching 
enzyme (SBE) activity, the encoded polypeptide comprising an effective portion of the 
amino acid sequence shown in Figure 4 (SEP. ID. NO. 29) or Figure 13 (SEP. ID. NO. 
31). 



